Single Program Multiple Data (SPMD) is a common design of program
for parallel computing. For more details, a Wikipedia page about
SPMD
gives a clear explanation among different designs.
I only demonstrate a few example here by using
Rscript
with the R package Rmpi
.
Why SPMD?
- Ease programming and debugging efforts.
- Scalability for parallel computing.
- Distributed system for ultra-large/unlimited datasets.
- Reduce traffics of communication.
Advantages of SPMD over Master/Worker
- It is very close to the serial code. i.e. SPMD is easy to modify from serial.
- It is much shorter than the original Master/Worker version. i.e. SPMD is traceable for debugging.
- It makes the master as one of workers. i.e. SPMD fully utilizes resources.
- It can run in Master/Worker mode. i.e. SPMD can do interaction, as well.
- It can also reduce intercommunication among processors. i.e. SPMD can deal with ultra-large/unlimited data.
- It is easy to automatically process large numbers of independent jobs. i.e. SPMD can parallelize by jobs.
pbdMPI
pbdMPI
provides an efficient interface to MPI by
utilizing S4 classes and methods with a focus on Single
Program/Multiple Data (SPMD) parallel programming style, which is
intended for batch parallel execution. pbdMPI are SPMD
extensions rewritten from Rmpi with many
performance improvements.
- Great extension for general R objects via S4 methods.
- Avoid potential dead locks in an interactive model.
- Gain performance from directly calls to MPI functions.
- Reduce cost of compression in communication.
- Ease programming and debugging efforts.
Examples
- A quick example for
pbdMPI
with possible extension toRmpi
can be found at Example. - More examples for statistical computing can be found at Cookbook.