Checkpointing state¶
In addition to the ability to write field data to vtu files, suitable for visualisation in Paraview, Firedrake has support for checkpointing state to disk. This enables pausing, and subsequently resuming, a simulation at a later time.
Checkpointing with CheckpointFile¶
CheckpointFile
class facilitates saving/loading meshes and
Function
s to/from an HDF5 file.
The implementation is scalable in that Function
s are
saved to and loaded from the file entirely in parallel without needing
to pass through a single process.
It also supports flexible checkpointing, where one can save meshes and
Function
s on \(N\) processes and later load them on
\(P\) processes.
If \(P == N\), the parallel distribution and entity permutation
(reordering) of the saved mesh is recovered on the loaded mesh by default.
The distribution and permutation data are by default stored under names
automatically generated by Firedrake.
Warning
If the mesh has a non-standard distribution, e.g., generated by a
partitioner with some non-standard parameters, it is recommended that
the user set the distribution name explicitly when constructing a mesh;
see, e.g., Mesh()
.
Warning
If \(P != N\) or P == N but distribution_parameters dict and/or reorder parameter are passed when loading a mesh, the saved mesh and the loaded mesh (and thus the saved function and the loaded function) will in general be represented by different global numbering systems and they are merely guaranteed to be analytically the same; as a consequence, it is currently not allowed to save the once loaded mesh or function back to the same file under the same name as this would cause conflict with other data stored using incompatible global numbering system. We plan to remove this restriction in the future.
Saving¶
In the following example we save in “example.h5” file two Function
s,
along with the mesh on which they are defined.
mesh = UnitSquareMesh(10, 10, name="meshA")
V = FunctionSpace(mesh, "CG", 2)
W = FunctionSpace(mesh, "CG", 1)
Z = V * W
f = Function(V, name="f")
g = Function(Z, name="g")
with CheckpointFile("example.h5", 'w') as afile:
afile.save_mesh(mesh) # optional
afile.save_function(f)
afile.save_function(g)
If the mesh name is not provided by the user when constructing the mesh, the
default mesh name, DEFAULT_MESH_NAME
, is assigned, which is then
used when saving in the file. We, however, strongly encourage users to name
each mesh.
Inspecting saved data¶
Now “example.h5” file has been created and the mesh and Function
data have been saved.
One can view the contents of the HDF5 file with “h5dump” utility shipped with
the HDF5 installation; “h5dump -n example.h5”, for instance, shows:
HDF5 "example.h5" {
FILE_CONTENTS {
group /
group /topologies
group /topologies/firedrake_mixed_meshes
group /topologies/firedrake_mixed_meshes/meshA
group /topologies/firedrake_mixed_meshes/meshA/firedrake_mixed_function_spaces
group /topologies/firedrake_mixed_meshes/meshA/firedrake_mixed_function_spaces/firedrake_function_space_meshA_CG2(None,None)_meshA_CG1(None,None)
group /topologies/firedrake_mixed_meshes/meshA/firedrake_mixed_function_spaces/firedrake_function_space_meshA_CG2(None,None)_meshA_CG1(None,None)/0
group /topologies/firedrake_mixed_meshes/meshA/firedrake_mixed_function_spaces/firedrake_function_space_meshA_CG2(None,None)_meshA_CG1(None,None)/1
group /topologies/firedrake_mixed_meshes/meshA/firedrake_mixed_function_spaces/firedrake_function_space_meshA_CG2(None,None)_meshA_CG1(None,None)/firedrake_functions
group /topologies/firedrake_mixed_meshes/meshA/firedrake_mixed_function_spaces/firedrake_function_space_meshA_CG2(None,None)_meshA_CG1(None,None)/firedrake_functions/g
group /topologies/firedrake_mixed_meshes/meshA/firedrake_mixed_function_spaces/firedrake_function_space_meshA_CG2(None,None)_meshA_CG1(None,None)/firedrake_functions/g/0
group /topologies/firedrake_mixed_meshes/meshA/firedrake_mixed_function_spaces/firedrake_function_space_meshA_CG2(None,None)_meshA_CG1(None,None)/firedrake_functions/g/1
group /topologies/meshA_topology
group /topologies/meshA_topology/distributions
group /topologies/meshA_topology/distributions/firedrake_default_1_True_None_(FACET,1)
dataset /topologies/meshA_topology/distributions/firedrake_default_1_True_None_(FACET,1)/chart_sizes
dataset /topologies/meshA_topology/distributions/firedrake_default_1_True_None_(FACET,1)/global_point_numbers
dataset /topologies/meshA_topology/distributions/firedrake_default_1_True_None_(FACET,1)/owners
group /topologies/meshA_topology/distributions/firedrake_default_1_True_None_(FACET,1)/permutations
group /topologies/meshA_topology/distributions/firedrake_default_1_True_None_(FACET,1)/permutations/firedrake_default_True
dataset /topologies/meshA_topology/distributions/firedrake_default_1_True_None_(FACET,1)/permutations/firedrake_default_True/permutation
group /topologies/meshA_topology/dms
group /topologies/meshA_topology/dms/coordinateDM
dataset /topologies/meshA_topology/dms/coordinateDM/order
group /topologies/meshA_topology/dms/coordinateDM/section
dataset /topologies/meshA_topology/dms/coordinateDM/section/atlasDof
dataset /topologies/meshA_topology/dms/coordinateDM/section/atlasOff
group /topologies/meshA_topology/dms/coordinateDM/section/field0
dataset /topologies/meshA_topology/dms/coordinateDM/section/field0/atlasDof
dataset /topologies/meshA_topology/dms/coordinateDM/section/field0/atlasOff
group /topologies/meshA_topology/dms/coordinateDM/section/field0/component0
group /topologies/meshA_topology/dms/coordinateDM/section/field0/component1
group /topologies/meshA_topology/dms/coordinateDM/vecs
group /topologies/meshA_topology/dms/coordinateDM/vecs/coordinates
dataset /topologies/meshA_topology/dms/coordinateDM/vecs/coordinates/coordinates
group /topologies/meshA_topology/dms/firedrake_dm_1_0_0_False_1
dataset /topologies/meshA_topology/dms/firedrake_dm_1_0_0_False_1/order
group /topologies/meshA_topology/dms/firedrake_dm_1_0_0_False_1/section
dataset /topologies/meshA_topology/dms/firedrake_dm_1_0_0_False_1/section/atlasDof
dataset /topologies/meshA_topology/dms/firedrake_dm_1_0_0_False_1/section/atlasOff
group /topologies/meshA_topology/dms/firedrake_dm_1_0_0_False_1/vecs
group /topologies/meshA_topology/dms/firedrake_dm_1_0_0_False_1/vecs/g[1]
dataset /topologies/meshA_topology/dms/firedrake_dm_1_0_0_False_1/vecs/g[1]/g[1]
group /topologies/meshA_topology/dms/firedrake_dm_1_0_0_False_2
dataset /topologies/meshA_topology/dms/firedrake_dm_1_0_0_False_2/order
group /topologies/meshA_topology/dms/firedrake_dm_1_0_0_False_2/section
dataset /topologies/meshA_topology/dms/firedrake_dm_1_0_0_False_2/section/atlasDof
dataset /topologies/meshA_topology/dms/firedrake_dm_1_0_0_False_2/section/atlasOff
group /topologies/meshA_topology/dms/firedrake_dm_1_0_0_False_2/vecs
group /topologies/meshA_topology/dms/firedrake_dm_1_0_0_False_2/vecs/meshA_coordinates
dataset /topologies/meshA_topology/dms/firedrake_dm_1_0_0_False_2/vecs/meshA_coordinates/meshA_coordinates
group /topologies/meshA_topology/dms/firedrake_dm_1_1_0_False_1
dataset /topologies/meshA_topology/dms/firedrake_dm_1_1_0_False_1/order
group /topologies/meshA_topology/dms/firedrake_dm_1_1_0_False_1/section
dataset /topologies/meshA_topology/dms/firedrake_dm_1_1_0_False_1/section/atlasDof
dataset /topologies/meshA_topology/dms/firedrake_dm_1_1_0_False_1/section/atlasOff
group /topologies/meshA_topology/dms/firedrake_dm_1_1_0_False_1/vecs
group /topologies/meshA_topology/dms/firedrake_dm_1_1_0_False_1/vecs/f
dataset /topologies/meshA_topology/dms/firedrake_dm_1_1_0_False_1/vecs/f/f
group /topologies/meshA_topology/dms/firedrake_dm_1_1_0_False_1/vecs/g[0]
dataset /topologies/meshA_topology/dms/firedrake_dm_1_1_0_False_1/vecs/g[0]/g[0]
group /topologies/meshA_topology/firedrake_meshes
group /topologies/meshA_topology/firedrake_meshes/meshA
group /topologies/meshA_topology/firedrake_meshes/meshA/firedrake_function_spaces
group /topologies/meshA_topology/firedrake_meshes/meshA/firedrake_function_spaces/firedrake_function_space_meshA_CG1(None,None)
group /topologies/meshA_topology/firedrake_meshes/meshA/firedrake_function_spaces/firedrake_function_space_meshA_CG1(None,None)/firedrake_functions
group /topologies/meshA_topology/firedrake_meshes/meshA/firedrake_function_spaces/firedrake_function_space_meshA_CG1(None,None)/firedrake_functions/g[1]
group /topologies/meshA_topology/firedrake_meshes/meshA/firedrake_function_spaces/firedrake_function_space_meshA_CG2(None,None)
group /topologies/meshA_topology/firedrake_meshes/meshA/firedrake_function_spaces/firedrake_function_space_meshA_CG2(None,None)/firedrake_functions
group /topologies/meshA_topology/firedrake_meshes/meshA/firedrake_function_spaces/firedrake_function_space_meshA_CG2(None,None)/firedrake_functions/f
group /topologies/meshA_topology/firedrake_meshes/meshA/firedrake_function_spaces/firedrake_function_space_meshA_CG2(None,None)/firedrake_functions/g[0]
group /topologies/meshA_topology/labels
group /topologies/meshA_topology/labels/...
...
group /topologies/meshA_topology/topology
dataset /topologies/meshA_topology/topology/cells
dataset /topologies/meshA_topology/topology/cones
dataset /topologies/meshA_topology/topology/order
dataset /topologies/meshA_topology/topology/orientation
}
}
Loading¶
We can load the mesh and Function
s in “example.h5” as in the
following.
with CheckpointFile("example.h5", 'r') as afile:
mesh = afile.load_mesh("meshA")
f = afile.load_function(mesh, "f")
g = afile.load_function(mesh, "g")
Note that one needs to load the mesh before loading the Function
s
that are defined on it. If the default mesh name, DEFAULT_MESH_NAME
,
was used when saving, the mesh name can be ommitted when loading.
Extrusion¶
Extruded meshes can be saved and loaded seamlessly as the following:
mesh = UnitSquareMesh(10, 10, name="meshA")
extm = ExtrudedMesh(mesh, layers=4)
V = FunctionSpace(extm, "CG", 2)
f = Function(V, name="f")
with CheckpointFile("example_extrusion.h5", 'w') as afile:
afile.save_mesh(mesh) # optional
afile.save_function(f)
with CheckpointFile("example_extrusion.h5", 'r') as afile:
extm = afile.load_mesh("meshA_extruded")
f = afile.load_function(extm, "f")
Note that if the name was not directly provided by the user, the base mesh’s name postfixed by “_extruded” is given to the extruded mesh.
Timestepping¶
The following demonstrates how a Function
can be saved and loaded
at each timestep in a time-series simulation by setting the idx parameter:
mesh = UnitSquareMesh(2, 2, name="meshA")
V = FunctionSpace(mesh, "CG", 1)
f = Function(V, name="f")
x, y = SpatialCoordinate(mesh)
with CheckpointFile("example_timestepping.h5", 'w') as afile:
afile.save_mesh(mesh) # optional
for i in range(4):
f.interpolate(x * i)
afile.save_function(f, idx=i)
with CheckpointFile("example_timestepping.h5", 'r') as afile:
mesh = afile.load_mesh("meshA")
for i in range(4):
f = afile.load_function(mesh, "f", idx=i)
Note that each Function
can either be saved in the timestepping mode
with idx parameter always set or in the normal mode (non-timestepping mode)
with idx parameter always unset, and the same Function
can only be
loaded using the same mode.
Using disk checkpointing in adjoint simulations¶
When adjoint annotation is active, the result of every Firedrake operation is
stored in memory. For some simulations, this can result in a very large memory
footprint. As an alternative, it is possible to specify that those intermediate
results in forward evaluations of the tape which have type
Function
be written to disk. This is usually the
bulk of the data stored on the tape so this largely alleviates the memory
problem, at the cost of the time taken to read to and write from disk.
Having imported firedrake.adjoint, there are two steps required to enable disk checkpointing of the forward tape state.
Wrap all mesh constructors in
checkpointable_mesh()
.
See the documentation of those functions for more detail.
Checkpointing with DumbCheckpoint¶
Warning
DumbCheckpoint
will be deprecated after 01/01/2023.
Instead, users are encouraged to use CheckpointFile
,
which is more robust and scalable.
The support for DumbCheckpoint
is somewhat limited. One may
only store Function
s in the checkpoint object. Moreover,
no remapping of data is performed. This means that resuming the
checkpoint is only possible on the same number of processes as used to
create the checkpoint file. Additionally, the same Mesh
must be used: that is a Mesh
constructed identically to the
mesh used to generate the saved checkpoint state.
Opening a checkpoint¶
A checkpoint file is created using the DumbCheckpoint
constructor. We pass a filename argument, and an access mode.
Available modes are:
Open the checkpoint file for reading. Raises
OSError
if the file does not already exist.
Open the checkpoint file for reading and writing, creating the file if it does not exist, and erasing any existing contents if it does.
Open the checkpoint file for reading and writing, creating it if it does not exist, without erasing any existing contents.
For example, to open a checkpoint file for writing solution state, truncating any existing contents we use:
chk = DumbCheckpoint("dump", mode=FILE_CREATE)
note how we only provide the base name of the on-disk file, ".h5"
is
appended automatically.
Storing data¶
Once a checkpoint file is opened, Function
data can be
stored in the checkpoint using store()
.
A Function
is referenced in the checkpoint file by its
Function.name
, but this may be overridden by explicitly
passing an optional name argument. For example, to store a
Function
using its default name use:
f = Function(V, name="foo")
chk.store(f)
If instead we want to override the name we use:
chk.store(f, name="bar")
Loading data¶
Once a checkpoint is created, we can use it to load saved state into
Function
s to resume a simulation. To load data into a
Function
from a checkpoint, we pass it to
load()
. As before, the data is looked up by
its Function.name
, although once again this may be
overridden by optionally specifying the name
as an argument.
For example, assume we had previously saved a checkpoint containing
two different Function
s with names "A"
and
"B"
. We can load these as follows:
chk = DumbCheckpoint("dump.h5", mode=FILE_READ)
a = Function(V, name="A")
b = Function(V)
# Use a.name() to look up value
chk.load(a)
# Look up value by explicitly specifying name="B"
chk.load(b, name="B")
Note
Since Firedrake does not currently support reading data from a
checkpoint file on a different number of processes from that it was
written with, whenever a Function
is stored, an
attribute is set recording the number of processes used. When
loading data from the checkpoint, this value is validated against
the current number of processes and an error is raised if they do
not match.
Closing a checkpoint¶
The on-disk file inside a checkpoint object is automatically closed
when the checkpoint object is garbage-collected. However, since this
may not happen at a predictable time, it is possible to manually close
a checkpoint file using close()
. To facilitate
this latter usage, checkpoint objects can be used as context
managers which ensure that the checkpoint file is closed as soon as
the object goes out of scope. To use this approach, we use the python
with
statement:
# Normal code here
with DumbCheckpoint("dump.h5", mode=FILE_UPDATE) as chk:
# Checkpoint file open for reading and writing
chk.store(...)
chk.load(...)
# Checkpoint file closed, continue with normal code
Writing attributes¶
In addition to storing Function
data, it is also possible
to store metadata in DumbCheckpoint
files using HDF5
attributes. This is carried out using h5py to manipulate the file.
The interface allows setting attribute values, reading them, and
checking if a file has a particular attribute:
Write an attribute, specifying the object path the attribute should be set on, the name of the attribute and its value.
Read an attribute with specified name from at a given object path.
Check if a particular attribute exists. Does not raise an error if the object also does not exist.
Support for multiple timesteps¶
The checkpoint object supports multiple timesteps in the same on-disk
file. The primary interface to this is via
set_timestep()
. If never called on a
checkpoint file, no timestep support is enabled, and storing a
Function
with the same name as an existing object
overwrites it (data is stored in the HDF5 group "/fields"
). If
one wishes to store multiple timesteps, one should call
set_timestep()
, providing the timestep value
(and optionally a timestep “index”). Storing a Function
will now write to the group "/fields/IDX"
. To store the same
function at a different time level, we just call
set_timestep()
again with a new timestep
value.
Inspecting available time levels¶
The stored time levels in the checkpoint object are available as
attributes in the file. They may be inspected by calling
get_timesteps()
. This returns a list of the
timesteps stored in the file, along with the indices they map to. In
addition, the timestep value is available as an attribute on the
appropriate field group: reading the attribute
"/fields/IDX/timestep"
returns the timestep value corresponding to
IDX
.
Support for multiple on-disk files¶
For large simulations, it may not be expedient to store all timesteps
in the same on-disk file. To this end, the DumbCheckpoint
object offers the facility to retain the same checkpoint object, but
change the on-disk file used to store the data. To switch to a new
on-disk file one uses new_file()
. There are
two method of choosing the new file name. If the
DumbCheckpoint
object was created passing
single_file=False
then calling new_file()
without any additional arguments will use an internal counter to
create file names by appending this counter to the provided base
name. This selection can be overridden by explicitly passing the
optional name
argument.
As an example, consider the following sequence:
with DumbCheckpoint("dump", single_file=False, mode=FILE_CREATE) as chk:
chk.store(a)
chk.store(b)
chk.new_file()
chk.store(c)
chk.new_file(name="special")
chk.store(d)
chk.new_file()
chk.store(e)
Will create four on-disk files:
dump_0.h5
Containing
a
andb
;
dump_1.h5
Containing
c
;
special.h5
Containing
d
;
dump_2.h5
Containing
e
.
Implementation details¶
The on-disk representation of checkpoints is as HDF5 files.
Firedrake uses the PETSc HDF5 Viewer object to write and read state.
As such, writing data is collective across processes. h5py is used
for attribute manipulation. To this end, h5py must be linked
against the same version of the HDF5 library that PETSc was built
with. The firedrake-install
script automates this, however, if
you build PETSc manually, you will need to ensure that h5py is linked
correctly following the instructions for custom installation here.
Warning
Calling h5py.File.close()
on the h5py representation will
likely result in errors inside PETSc (since it is not aware that
the file has been closed). So don’t do that!