Firedrake uses MPI for distributed memory parallelism. This is
carried out transparently as long as your usage of Firedrake is only
through the public API. To run your code in parallel you need you use
the MPI job launcher available on your system. Often this program is
mpiexec. For example, to run a simulation in a file named
simulation.py on 16 processes we might use.
mpiexec -n 16 python simulation.py
By default, Firedrake makes use of an MPICH library that is
downloaded, configured, and installed in the virtual environment as
part of the PETSc installation procedure. If you do not intend to use
parallelism, or only use it in a limited way, this will be sufficient
for your needs. The default MPICH installation uses
nemesis as the
MPI channel, which is reasonably fast, but imposes a hard limit on the
maximum number of concurrent MPI threads equal to the number of cores
on your machine. If you would like to be able to oversubscribe your
machine, and run more threads than cores, you need to change the MPICH
device at install time to
sock, by setting an environment variable
before you run
If parallel performance is important to you (e.g., for generating reliable timings or using a supercomputer), then you should probably be using an MPICH library tuned for your system. If you have a system-wide install already available, then you can simply tell the firedrake installer to use it, by running:
python3 firedrake-install --mpiexec=mpiexec --mpicc=mpicc --mpicxx=mpicxx --mpif90=mpif90
mpif90 are the
commands to run an MPI job and to compile C, C++, and Fortran 90 code,
The MPI execution model is that of single program, multiple data. As a result, printing output
requires a little bit of care: just using
print() will result
in every process producing output. A sensible approach is to use
PETSc’s printing facilities to handle this, as covered in this
Without detailed analysis, it is difficult to say precisely how much performance improvement should be expected from running in parallel. As a rule of thumb, it is worthwhile adding more processes as long as the number of degrees of freedom per process is more than around 50000. This is explored in some depth in the main Firedrake paper. Additionally, most of the finite element calculations performed by Firedrake are limited by the memory bandwidth of the machine. You can measure how the achieved memory bandwidth changes depending on the number of processes used on your machine using STREAMS.
Firedrake objects often contain PETSc objects, but are managed by the Python garbage collector. It is possible that when executing in parallel, code will deadlock as the Python garbage collector is not collective over the MPI communicator that the PETSc objects are collective over. If you find parallel code hanging for inexplicable reasons, it is possible to turn off the Python garbage collector by including these lines in your code:
import gc gc.disable()
Disabling the garbage collector may cause memory leaks. It is
possible to call the garbage collector manually using
gc.collect() to avoid the issue, but this may in turn
lead to a deadlock.
The garbage collector can be turned back on with this line:
More information can be found in this issue.
By default, Firedrake parallelises across
MPI_COMM_WORLD. If you
want to perform a simulation in which different subsets of processes
perform different computations (perhaps solving the same PDE for
multiple different initial conditions), this can be achieved by using
sub-communicators. The mechanism to do so is to provide a
communicator when building the
Mesh() you will perform the
simulation on, using the optional
comm keyword argument. All
subsequent operations using that mesh are then only collective over
the supplied communicator, rather than
example, to split the global communicator into two and perform two
different simulations on the two halves we would write.
from firedrake import * comm = COMM_WORLD.Split(COMM_WORLD.rank % 2) if COMM_WORLD.rank % 2 == 0: # Even ranks create a quad mesh mesh = UnitSquareMesh(N, N, quadrilateral=True, comm=comm) else: # Odd ranks create a triangular mesh mesh = UnitSquareMesh(N, N, comm=comm) ...
To access the communicator a mesh was created on, we can use the
comm property, or the function
Ensemble parallelism means solving simultaneous copies of a model with different coefficients, RHS or initial data, in situations that require communication between the copies. Use cases include ensemble data assimilation, uncertainty quantification, and time parallelism.
In ensemble parallelism, we split the MPI communicator into a number of subcommunicators, each of which we refer to as an ensemble member. Within each ensemble member, existing Firedrake functionality allows us to specify the FE problem solves which use spatial parallelism across the subcommunicator in the usual way. Another set of subcommunicators then allow communication between ensemble members.
The additional functionality required to support ensemble parallelism
is the ability to send instances of
Function from one
ensemble to another. This is handled by the
class. Instantiating an ensemble requires a communicator (usually
MPI_COMM_WORLD) plus the number of MPI processes to be used in
each member of the ensemble (5, in the case of the example
below). Each ensemble member will have the same spatial parallelism
with the number of ensemble members given by dividing the size of the
original communicator by the number processes in each ensemble
member. The total number of processes launched by
therefore be equal to the product of number of ensemble members with
the number of processes to be used for each ensemble member.
from firedrake import * my_ensemble = Ensemble(COMM_WORLD, 5)
mesh = UnitSquareMesh(20, 20, comm=my_ensemble.comm) x, y = SpatialCoordinate(mesh) V = FunctionSpace(mesh, "CG", 1) u = Function(V)
The ensemble sub-communicator is then available through the method
q = Constant(my_ensemble.ensemble_comm.rank + 1) u.interpolate(sin(q*pi*x)*cos(q*pi*y))
MPI communications across the spatial sub-communicator (i.e., within
an ensemble member) are handled automatically by Firedrake, whilst MPI
communications across the ensemble sub-communicator (i.e., between ensemble
members) are handled through methods of
Ensemble. Currently only
global reductions are supported.