Performance, cluster environments and reproducibility¶
If you plan to run OGGM on more than a handful of glaciers, you might be interested in using all processors available to you, whether you are working on your laptop or on a cluster: see Parallel computations for how to do this.
For regional or global computations you will need to run OGGM in Cluster environments. Here we provide a couple of guidelines based on our own experience with operational runs.
In Reproducibility with OGGM, we discuss certain aspects of scientific reproducibility with OGGM, and how we try to ensure that our results are reproducible (it’s not easy).
OGGM is designed to use the available resources as well as possible. For single nodes machines but with more than one processor (e.g. for personal computers) OGGM ships with a multiprocessing approach which is fairly simple to use. For cluster environments with more than one machine, you can use MPI.
Most OGGM computations are embarrassingly parallel: they are standalone operations to be realized on one single glacier entity and therefore independent from each other (they are called entity tasks, as opposed to the non-parallelizable global tasks).
When given a list of Glacier directories on which to apply a given task,
workflow.execute_entity_task() will distribute the operations on
the available processors using Python’s multiprocessing module.
You can control this behavior with the
parameter and the number of processors with
The default in OGGM is:
In : from oggm import cfg In : cfg.initialize() In : cfg.PARAMS['use_multiprocessing'] # whether to use multiprocessing Out: True In : cfg.PARAMS['mp_processes'] # number of processors to use