Warning

You are reading a version of the website built against the unstable main branch. This content is liable to change without notice and may be inappropriate for your use case. You can find the documentation for the current stable release here.

Ensemble parallelism

Ensemble parallelism means solving simultaneous copies of a model with different coefficients, right hand sides, or initial data, in situations that require communication between the copies. Use cases include ensemble data assimilation, uncertainty quantification, and time parallelism. This manual section assumes some familiarity with parallel programming with MPI.

The Ensemble communicator

In ensemble parallelism, we split the MPI communicator into a number of spatial subcommunicators, each of which we refer to as an ensemble member (shown in blue in the figure below). Within each ensemble member, existing Firedrake functionality allows us to specify the finite element problems which use spatial parallelism across the spatial subcommunicator in the usual way. Another set of subcommunicators - the ensemble subcommunicators - then allow communication between ensemble members (shown in grey in the figure below). Together, the spatial and ensemble subcommunicators form a Cartesian product over the original global communicator.

_images/ensemble.svg

Spatial and ensemble parallelism for an ensemble with 5 members, each of which is executed in parallel over 5 processors.

The additional functionality required to support ensemble parallelism is the ability to send instances of Function from one ensemble to another. This is handled by the Ensemble class.

Each ensemble member must have the same spatial parallel domain decomposition, so instantiating an Ensemble requires a communicator to split (usually, but not necessarily, MPI_COMM_WORLD) plus the number of MPI processes to be used in each member of the ensemble (5 in the figure above, and 2 in the example code below). The number of ensemble members is implicitly calculated by dividing the size of the original communicator by the number processes in each ensemble member. The total number of processes launched by mpiexec must therefore be equal to the product of the number of ensemble members with the number of processes to be used for each ensemble member, and an exception will be raised if this is not the case.

my_ensemble = Ensemble(COMM_WORLD, 2)

Then, the spatial sub-communicator Ensemble.comm must be passed to Mesh() (possibly via inbuilt mesh generators in utility_meshes), so that it will then be used by any FunctionSpace() and Function derived from the mesh.

mesh = UnitSquareMesh(
    20, 20, comm=my_ensemble.comm,
    distribution_parameters={"partitioner_type": "simple"})

x, y = SpatialCoordinate(mesh)
V = FunctionSpace(mesh, "CG", 1)
u = Function(V)

The ensemble sub-communicator is then available through the attribute Ensemble.ensemble_comm.

q = Constant(my_ensemble.ensemble_comm.rank + 1)
u.interpolate(sin(q*pi*x)*cos(q*pi*y))

MPI communications across the spatial sub-communicator (i.e., within an ensemble member) are handled automatically by Firedrake, whilst MPI communications across the ensemble sub-communicator (i.e., between ensemble members) are handled through methods of Ensemble. Currently send/recv, reductions and broadcasts are supported, as well as their non-blocking variants. The rank of the the ensemble member (my_ensemble.ensemble_comm.rank) and the number of ensemble members (my_ensemble.ensemble_comm.rank) can be accessed via the ensemble_rank and ensemble_size attributes.

ensemble_rank = my_ensemble.ensemble_rank
ensemble_size = my_ensemble.ensemble_size

dest = (ensemble_rank + 1) % ensemble_size
source = (ensemble_rank - 1) % ensemble_size
root = 0
usum = Function(V)

my_ensemble.send(u, dest)
my_ensemble.recv(u, source)

my_ensemble.reduce(u, usum, root=root)
my_ensemble.allreduce(u, usum)

my_ensemble.bcast(u, root=root)

Warning

In the Ensemble communication methods, each rank sends data only across the ensemble_comm that it is a part of. This assumes not only that the total mesh is identical on each ensemble member, but also that the ensemble_comm connects identical parts of the mesh on each ensemble member. Because of this, the spatial partitioning of the mesh on each Ensemble.comm must be identical.

EnsembleFunction and EnsembleFunctionSpace

A Function is logically collective over a single spatial communicator Ensemble.comm. However, for some applications we want to treat multiple Function instances on different ensemble members as a single collective object over the entire global communicator Ensemble.global_comm. For example, in time-parallel methods we may have a Function for each timestep in a timeseries, and each timestep may live on a separate ensemble member. In this case we want to treat the entire timeseries as a single object.

Firedrake implements this using EnsembleFunctionSpace and EnsembleFunction (along with the dual objects EnsembleDualSpace and EnsembleCofunction). The EnsembleFunctionSpace can be thought of as a mixed function space which is parallelised across the components, as opposed to just being parallelised in space, as would usually be the case with FunctionSpace(). Each component of an EnsembleFunctionSpace is a Firedrake FunctionSpace() on a single spatial communicator.

To create an EnsembleFunctionSpace you must provide an Ensemble and, on each spatial communicator, a list of FunctionSpace() instances for the components on the local Ensemble.comm. There can be a different number of local FunctionSpace() on each Ensemble.comm. In the example below we create an EnsembleFunctionSpace with two components on the first ensemble member, and three components on every other ensemble member. Note that, unlike a FunctionSpace(), a component of an EnsembleFunctionSpace may itself be a MixedFunctionSpace().

V = FunctionSpace(mesh, "CG", 1)
U = FunctionSpace(mesh, "DG", 0)
W = U*V

if ensemble_rank == 0:
    local_spaces = [V, U]
else:
    local_spaces = [V, U, W]

efs = EnsembleFunctionSpace(local_spaces, my_ensemble)
eds = efs.dual()

Analogously to accessing the components of a MixedFunctionSpace() using subspaces, the FunctionSpace() for each local component of an EnsembleFunctionSpace can be accessed via EnsembleFunctionSpace.local_spaces. Various other methods and properties such as dual (to create an EnsembleDualSpace) and nglobal_spaces (total number of components across all ranks) are also available.

An EnsembleFunction and EnsembleCofunction can be created from the EnsembleFunctionSpace. These have a subfunctions property that can be used to access the components on the local ensemble member. Each element in EnsembleFunction.subfunctions is itself just a normal Firedrake Function. If a component of the EnsembleFunctionSpace is a MixedFunctionSpace, then the corresponding component in EnsembleFunction.subfunctions will be a mixed Function in that MixedFunctionSpace.

efunc = EnsembleFunction(efs)
ecofunc = EnsembleCofunction(eds)

v = Function(V).assign(6)
efunc.subfunctions[0].project(v)

ustar = Cofunction(eds.local_spaces[1])
efunc.subfunctions[1].assign(ustar.riesz_representation())

EnsembleFunction and EnsembleCofunction have a range of methods equivalent to those of Function and Cofunction, such as assign, zero, riesz_representation, arithmetic operators e.g. +, +=, etc. These act component-wise on each local component.

Because the components in EnsembleFunction.subfunctions (EnsembleCofunction.subfunctions) are just Function (Cofunction) instances, they can be used directly with variational forms and solvers. In the example code below, We create a LinearVariationalSolver where the right hand side is a component of an EnsembleCofunction, and the solution is written into a component of an EnsembleFunction. Using the subfunctions directly like this can simplify ensemble code and reduce unnecessary copies. Note that the options_prefix is set using both the local ensemble rank and the index of the local space, which means that separate PETSc parameters can be passed from the command line to the solver on each ensemble member.

u = TrialFunction(efs.local_spaces[0])
v = TestFunction(efs.local_spaces[0])

a = inner(u, v)*dx + inner(grad(u), grad(v))*dx
L = ecofunc.subfunctions[0]

prefix = f"lvs_{ensemble_rank}_0"
lvp = LinearVariationalProblem(a, L, efunc.subfunctions[0])
lvs = LinearVariationalSolver(lvp, options_prefix=prefix)

ecofunc.subfunctions[0].assign(1)
lvs.solve()

Warning

Although the Function (Cofunction) instances in EnsembleFunction.subfunctions (EnsembleCofunction.subfunctions) can be used in UFL expressions, EnsembleFunction and EnsembleCofunction themselves do not carry any symbolic information so cannot be used in UFL expressions.

Internally, the EnsembleFunction creates a PETSc.Vec on the Ensemble.global_comm which contains the data for all local components on all ensemble members. This Vec can be accessed with a context manager, similarly to the Function.dat.vec context managers used to access Function data. There are also analogous vec_ro and vec_wo context managers for read/write only accesses. However note that, unlike the Function.dat.vec context managers, the EnsembleFunction.vec context managers need braces i.e. vec() not vec.

with efunc.vec_ro() as vec:
    PETSc.Sys.Print(f"{vec.norm()=}")