# Checkpointing state¶

In addition to the ability to write field data to vtu files, suitable for visualisation in Paraview, Firedrake has support for checkpointing state to disk. This enables pausing, and subsequently resuming, a simulation at a later time.

## Checkpointing with CheckpointFile¶

CheckpointFile class facilitates saving/loading meshes and Function s to/from an HDF5 file. The implementation is scalable in that Function s are saved to and loaded from the file entirely in parallel without needing to pass through a single process. It also supports flexible checkpointing, where one can save meshes and Function s on $$N$$ processes and later load them on $$P$$ processes. If $$P == N$$, the parallel distribution and entity permutation (reordering) of the saved mesh is recovered on the loaded mesh by default. The distribution and permutation data are by default stored under names automatically generated by Firedrake.

Warning

If the mesh has a non-standard distribution, e.g., generated by a partitioner with some non-standard parameters, it is recommended that the user set the distribution name explicitly when constructing a mesh; see, e.g., Mesh().

Warning

If $$P != N$$ or P == N but distribution_parameters dict and/or reorder parameter are passed when loading a mesh, the saved mesh and the loaded mesh (and thus the saved function and the loaded function) will in general be represented by different global numbering systems and they are merely guaranteed to be analytically the same; as a consequence, it is currently not allowed to save the once loaded mesh or function back to the same file under the same name as this would cause conflict with other data stored using incompatible global numbering system. We plan to remove this restriction in the future.

### Saving¶

In the following example we save in “example.h5” file two Function s, along with the mesh on which they are defined.

mesh = UnitSquareMesh(10, 10, name="meshA")
V = FunctionSpace(mesh, "CG", 2)
W = FunctionSpace(mesh, "CG", 1)
Z = V * W
f = Function(V, name="f")
g = Function(Z, name="g")
with CheckpointFile("example.h5", 'w') as afile:
afile.save_mesh(mesh)  # optional
afile.save_function(f)
afile.save_function(g)


If the mesh name is not provided by the user when constructing the mesh, the default mesh name, DEFAULT_MESH_NAME, is assigned, which is then used when saving in the file. We, however, strongly encourage users to name each mesh.

### Inspecting saved data¶

Now “example.h5” file has been created and the mesh and Function data have been saved. One can view the contents of the HDF5 file with “h5dump” utility shipped with the HDF5 installation; “h5dump -n example.h5”, for instance, shows:

HDF5 "example.h5" {
FILE_CONTENTS {
group      /
group      /topologies
group      /topologies/firedrake_mixed_meshes
group      /topologies/firedrake_mixed_meshes/meshA
group      /topologies/firedrake_mixed_meshes/meshA/firedrake_mixed_function_spaces
group      /topologies/firedrake_mixed_meshes/meshA/firedrake_mixed_function_spaces/firedrake_function_space_meshA_CG2(None,None)_meshA_CG1(None,None)
group      /topologies/firedrake_mixed_meshes/meshA/firedrake_mixed_function_spaces/firedrake_function_space_meshA_CG2(None,None)_meshA_CG1(None,None)/0
group      /topologies/firedrake_mixed_meshes/meshA/firedrake_mixed_function_spaces/firedrake_function_space_meshA_CG2(None,None)_meshA_CG1(None,None)/1
group      /topologies/firedrake_mixed_meshes/meshA/firedrake_mixed_function_spaces/firedrake_function_space_meshA_CG2(None,None)_meshA_CG1(None,None)/firedrake_functions
group      /topologies/firedrake_mixed_meshes/meshA/firedrake_mixed_function_spaces/firedrake_function_space_meshA_CG2(None,None)_meshA_CG1(None,None)/firedrake_functions/g
group      /topologies/firedrake_mixed_meshes/meshA/firedrake_mixed_function_spaces/firedrake_function_space_meshA_CG2(None,None)_meshA_CG1(None,None)/firedrake_functions/g/0
group      /topologies/firedrake_mixed_meshes/meshA/firedrake_mixed_function_spaces/firedrake_function_space_meshA_CG2(None,None)_meshA_CG1(None,None)/firedrake_functions/g/1
group      /topologies/meshA_topology
group      /topologies/meshA_topology/distributions
group      /topologies/meshA_topology/distributions/firedrake_default_1_True_None_(FACET,1)
dataset    /topologies/meshA_topology/distributions/firedrake_default_1_True_None_(FACET,1)/chart_sizes
dataset    /topologies/meshA_topology/distributions/firedrake_default_1_True_None_(FACET,1)/global_point_numbers
dataset    /topologies/meshA_topology/distributions/firedrake_default_1_True_None_(FACET,1)/owners
group      /topologies/meshA_topology/distributions/firedrake_default_1_True_None_(FACET,1)/permutations
group      /topologies/meshA_topology/distributions/firedrake_default_1_True_None_(FACET,1)/permutations/firedrake_default_True
dataset    /topologies/meshA_topology/distributions/firedrake_default_1_True_None_(FACET,1)/permutations/firedrake_default_True/permutation
group      /topologies/meshA_topology/dms
group      /topologies/meshA_topology/dms/coordinateDM
dataset    /topologies/meshA_topology/dms/coordinateDM/order
group      /topologies/meshA_topology/dms/coordinateDM/section
dataset    /topologies/meshA_topology/dms/coordinateDM/section/atlasDof
dataset    /topologies/meshA_topology/dms/coordinateDM/section/atlasOff
group      /topologies/meshA_topology/dms/coordinateDM/section/field0
dataset    /topologies/meshA_topology/dms/coordinateDM/section/field0/atlasDof
dataset    /topologies/meshA_topology/dms/coordinateDM/section/field0/atlasOff
group      /topologies/meshA_topology/dms/coordinateDM/section/field0/component0
group      /topologies/meshA_topology/dms/coordinateDM/section/field0/component1
group      /topologies/meshA_topology/dms/coordinateDM/vecs
group      /topologies/meshA_topology/dms/coordinateDM/vecs/coordinates
dataset    /topologies/meshA_topology/dms/coordinateDM/vecs/coordinates/coordinates
group      /topologies/meshA_topology/dms/firedrake_dm_1_0_0_False_1
dataset    /topologies/meshA_topology/dms/firedrake_dm_1_0_0_False_1/order
group      /topologies/meshA_topology/dms/firedrake_dm_1_0_0_False_1/section
dataset    /topologies/meshA_topology/dms/firedrake_dm_1_0_0_False_1/section/atlasDof
dataset    /topologies/meshA_topology/dms/firedrake_dm_1_0_0_False_1/section/atlasOff
group      /topologies/meshA_topology/dms/firedrake_dm_1_0_0_False_1/vecs
group      /topologies/meshA_topology/dms/firedrake_dm_1_0_0_False_1/vecs/g[1]
dataset    /topologies/meshA_topology/dms/firedrake_dm_1_0_0_False_1/vecs/g[1]/g[1]
group      /topologies/meshA_topology/dms/firedrake_dm_1_0_0_False_2
dataset    /topologies/meshA_topology/dms/firedrake_dm_1_0_0_False_2/order
group      /topologies/meshA_topology/dms/firedrake_dm_1_0_0_False_2/section
dataset    /topologies/meshA_topology/dms/firedrake_dm_1_0_0_False_2/section/atlasDof
dataset    /topologies/meshA_topology/dms/firedrake_dm_1_0_0_False_2/section/atlasOff
group      /topologies/meshA_topology/dms/firedrake_dm_1_0_0_False_2/vecs
group      /topologies/meshA_topology/dms/firedrake_dm_1_0_0_False_2/vecs/meshA_coordinates
dataset    /topologies/meshA_topology/dms/firedrake_dm_1_0_0_False_2/vecs/meshA_coordinates/meshA_coordinates
group      /topologies/meshA_topology/dms/firedrake_dm_1_1_0_False_1
dataset    /topologies/meshA_topology/dms/firedrake_dm_1_1_0_False_1/order
group      /topologies/meshA_topology/dms/firedrake_dm_1_1_0_False_1/section
dataset    /topologies/meshA_topology/dms/firedrake_dm_1_1_0_False_1/section/atlasDof
dataset    /topologies/meshA_topology/dms/firedrake_dm_1_1_0_False_1/section/atlasOff
group      /topologies/meshA_topology/dms/firedrake_dm_1_1_0_False_1/vecs
group      /topologies/meshA_topology/dms/firedrake_dm_1_1_0_False_1/vecs/f
dataset    /topologies/meshA_topology/dms/firedrake_dm_1_1_0_False_1/vecs/f/f
group      /topologies/meshA_topology/dms/firedrake_dm_1_1_0_False_1/vecs/g[0]
dataset    /topologies/meshA_topology/dms/firedrake_dm_1_1_0_False_1/vecs/g[0]/g[0]
group      /topologies/meshA_topology/firedrake_meshes
group      /topologies/meshA_topology/firedrake_meshes/meshA
group      /topologies/meshA_topology/firedrake_meshes/meshA/firedrake_function_spaces
group      /topologies/meshA_topology/firedrake_meshes/meshA/firedrake_function_spaces/firedrake_function_space_meshA_CG1(None,None)
group      /topologies/meshA_topology/firedrake_meshes/meshA/firedrake_function_spaces/firedrake_function_space_meshA_CG1(None,None)/firedrake_functions
group      /topologies/meshA_topology/firedrake_meshes/meshA/firedrake_function_spaces/firedrake_function_space_meshA_CG1(None,None)/firedrake_functions/g[1]
group      /topologies/meshA_topology/firedrake_meshes/meshA/firedrake_function_spaces/firedrake_function_space_meshA_CG2(None,None)
group      /topologies/meshA_topology/firedrake_meshes/meshA/firedrake_function_spaces/firedrake_function_space_meshA_CG2(None,None)/firedrake_functions
group      /topologies/meshA_topology/firedrake_meshes/meshA/firedrake_function_spaces/firedrake_function_space_meshA_CG2(None,None)/firedrake_functions/f
group      /topologies/meshA_topology/firedrake_meshes/meshA/firedrake_function_spaces/firedrake_function_space_meshA_CG2(None,None)/firedrake_functions/g[0]
group      /topologies/meshA_topology/labels
group      /topologies/meshA_topology/labels/...
...
group      /topologies/meshA_topology/topology
dataset    /topologies/meshA_topology/topology/cells
dataset    /topologies/meshA_topology/topology/cones
dataset    /topologies/meshA_topology/topology/order
dataset    /topologies/meshA_topology/topology/orientation
}
}


We can load the mesh and Function s in “example.h5” as in the following.

with CheckpointFile("example.h5", 'r') as afile:


Note that one needs to load the mesh before loading the Function s that are defined on it. If the default mesh name, DEFAULT_MESH_NAME, was used when saving, the mesh name can be ommitted when loading.

### Extrusion¶

Extruded meshes can be saved and loaded seamlessly as the following:

mesh = UnitSquareMesh(10, 10, name="meshA")
extm = ExtrudedMesh(mesh, layers=4)
V = FunctionSpace(extm, "CG", 2)
f = Function(V, name="f")
with CheckpointFile("example_extrusion.h5", 'w') as afile:
afile.save_mesh(mesh)  # optional
afile.save_function(f)
with CheckpointFile("example_extrusion.h5", 'r') as afile:


Note that if the name was not directly provided by the user, the base mesh’s name postfixed by “_extruded” is given to the extruded mesh.

### Timestepping¶

The following demonstrates how a Function can be saved and loaded at each timestep in a time-series simulation by setting the idx parameter:

mesh = UnitSquareMesh(2, 2, name="meshA")
V = FunctionSpace(mesh, "CG", 1)
f = Function(V, name="f")
x, y = SpatialCoordinate(mesh)
with CheckpointFile("example_timestepping.h5", 'w') as afile:
afile.save_mesh(mesh)  # optional
for i in range(4):
f.interpolate(x * i)
afile.save_function(f, idx=i)
with CheckpointFile("example_timestepping.h5", 'r') as afile:
for i in range(4):


Note that each Function can either be saved in the timestepping mode with idx parameter always set or in the normal mode (non-timestepping mode) with idx parameter always unset, and the same Function can only be loaded using the same mode.

## Using disk checkpointing in adjoint simulations¶

When adjoint annotation is active, the result of every Firedrake operation is stored in memory. For some simulations, this can result in a very large memory footprint. As an alternative, it is possible to specify that those intermediate results in forward evaluations of the tape which have type Function be written to disk. This is usually the bulk of the data stored on the tape so this largely alleviates the memory problem, at the cost of the time taken to read to and write from disk.

Having imported firedrake_adjoint, there are two steps required to enable disk checkpointing of the forward tape state.

1. Wrap all mesh constructors in checkpointable_mesh().

See the documentation of those functions for more detail.

## Checkpointing with DumbCheckpoint¶

Warning

DumbCheckpoint will be deprecated after 01/01/2023. Instead, users are encouraged to use CheckpointFile, which is more robust and scalable.

The support for DumbCheckpoint is somewhat limited. One may only store Functions in the checkpoint object. Moreover, no remapping of data is performed. This means that resuming the checkpoint is only possible on the same number of processes as used to create the checkpoint file. Additionally, the same Mesh must be used: that is a Mesh constructed identically to the mesh used to generate the saved checkpoint state.

### Opening a checkpoint¶

A checkpoint file is created using the DumbCheckpoint constructor. We pass a filename argument, and an access mode. Available modes are:

FILE_READ

Open the checkpoint file for reading. Raises OSError if the file does not already exist.

FILE_CREATE

Open the checkpoint file for reading and writing, creating the file if it does not exist, and erasing any existing contents if it does.

FILE_UPDATE

Open the checkpoint file for reading and writing, creating it if it does not exist, without erasing any existing contents.

For example, to open a checkpoint file for writing solution state, truncating any existing contents we use:

chk = DumbCheckpoint("dump", mode=FILE_CREATE)


note how we only provide the base name of the on-disk file, ".h5" is appended automatically.

### Storing data¶

Once a checkpoint file is opened, Function data can be stored in the checkpoint using store(). A Function is referenced in the checkpoint file by its name(), but this may be overridden by explicitly passing an optional name argument. For example, to store a Function using its default name use:

f = Function(V, name="foo")
chk.store(f)


If instead we want to override the name we use:

chk.store(f, name="bar")


Warning

No warning is provided when storing multiple Functions with the same name, existing values are overwritten.

Moreover, attempting to store a Function with a different number of degrees of freedom into an existing name will cause an error.

Once a checkpoint is created, we can use it to load saved state into Functions to resume a simulation. To load data into a Function from a checkpoint, we pass it to load(). As before, the data is looked up by its name(), although once again this may be overridden by optionally specifying the name as an argument.

For example, assume we had previously saved a checkpoint containing two different Functions with names "A" and "B". We can load these as follows:

chk = DumbCheckpoint("dump.h5", mode=FILE_READ)

a = Function(V, name="A")

b = Function(V)

# Use a.name() to look up value

# Look up value by explicitly specifying name="B"


Note

Since Firedrake does not currently support reading data from a checkpoint file on a different number of processes from that it was written with, whenever a Function is stored, an attribute is set recording the number of processes used. When loading data from the checkpoint, this value is validated against the current number of processes and an error is raised if they do not match.

### Closing a checkpoint¶

The on-disk file inside a checkpoint object is automatically closed when the checkpoint object is garbage-collected. However, since this may not happen at a predictable time, it is possible to manually close a checkpoint file using close(). To facilitate this latter usage, checkpoint objects can be used as context managers which ensure that the checkpoint file is closed as soon as the object goes out of scope. To use this approach, we use the python with statement:

# Normal code here
with DumbCheckpoint("dump.h5", mode=FILE_UPDATE) as chk:
# Checkpoint file open for reading and writing
chk.store(...)

# Checkpoint file closed, continue with normal code


### Writing attributes¶

In addition to storing Function data, it is also possible to store metadata in DumbCheckpoint files using HDF5 attributes. This is carried out using h5py to manipulate the file. The interface allows setting attribute values, reading them, and checking if a file has a particular attribute:

write_attribute()

Write an attribute, specifying the object path the attribute should be set on, the name of the attribute and its value.

read_attribute()

Read an attribute with specified name from at a given object path.

has_attribute()

Check if a particular attribute exists. Does not raise an error if the object also does not exist.

### Support for multiple timesteps¶

The checkpoint object supports multiple timesteps in the same on-disk file. The primary interface to this is via set_timestep(). If never called on a checkpoint file, no timestep support is enabled, and storing a Function with the same name as an existing object overwrites it (data is stored in the HDF5 group "/fields"). If one wishes to store multiple timesteps, one should call set_timestep(), providing the timestep value (and optionally a timestep “index”). Storing a Function will now write to the group "/fields/IDX". To store the same function at a different time level, we just call set_timestep() again with a new timestep value.

### Inspecting available time levels¶

The stored time levels in the checkpoint object are available as attributes in the file. They may be inspected by calling get_timesteps(). This returns a list of the timesteps stored in the file, along with the indices they map to. In addition, the timestep value is available as an attribute on the appropriate field group: reading the attribute "/fields/IDX/timestep" returns the timestep value corresponding to IDX.

### Support for multiple on-disk files¶

For large simulations, it may not be expedient to store all timesteps in the same on-disk file. To this end, the DumbCheckpoint object offers the facility to retain the same checkpoint object, but change the on-disk file used to store the data. To switch to a new on-disk file one uses new_file(). There are two method of choosing the new file name. If the DumbCheckpoint object was created passing single_file=False then calling new_file() without any additional arguments will use an internal counter to create file names by appending this counter to the provided base name. This selection can be overridden by explicitly passing the optional name argument.

As an example, consider the following sequence:

with DumbCheckpoint("dump", single_file=False, mode=FILE_CREATE) as chk:
chk.store(a)
chk.store(b)
chk.new_file()
chk.store(c)
chk.new_file(name="special")
chk.store(d)
chk.new_file()
chk.store(e)


Will create four on-disk files:

dump_0.h5

Containing a and b;

dump_1.h5

Containing c;

special.h5

Containing d;

dump_2.h5

Containing e.

## Implementation details¶

The on-disk representation of checkpoints is as HDF5 files. Firedrake uses the PETSc HDF5 Viewer object to write and read state. As such, writing data is collective across processes. h5py is used for attribute manipulation. To this end, h5py must be linked against the same version of the HDF5 library that PETSc was built with. The firedrake-install script automates this, however, if you build PETSc manually, you will need to ensure that h5py is linked correctly following the instructions for custom installation here.

Warning

Calling h5py:File.close() on the h5py representation will likely result in errors inside PETSc (since it is not aware that the file has been closed). So don’t do that!