Single Program Multiple Data (SPMD) is a common design of program
for parallel computing. For more details, a Wikipedia page about
SPMD
gives a clear explanation among different designs.
I only demonstrate a few example here by using
Rscript
with the R package Rmpi
.
---
#### Why SPMD?
- Ease programming and debugging efforts.
- Scalability for parallel computing.
- Distributed system for ultra-large/unlimited datasets.
- Reduce traffics of communication.
---
#### Advantages of SPMD over Master/Worker
1. It is very close to the serial code.
i.e. SPMD is easy to modify from serial.
2. It is much shorter than the original Master/Worker version.
i.e. SPMD is traceable for debugging.
3. It makes the master as one of workers.
i.e. SPMD fully utilizes resources.
4. It can run in Master/Worker mode.
i.e. SPMD can do interaction, as well.
5. It can also reduce intercommunication among processors.
i.e. SPMD can deal with ultra-large/unlimited data.
6. It is easy to automatically process large numbers of independent jobs.
i.e. SPMD can parallelize by jobs.
---
#### pbdMPI
pbdMPI
provides an efficient interface to MPI by
utilizing S4 classes and methods with a focus on Single
Program/Multiple Data (SPMD) parallel programming style, which is
intended for batch parallel execution. pbdMPI are SPMD
extensions rewritten from Rmpi with many
performance improvements.
- Great extension for general R objects via S4 methods.
- Avoid potential dead locks in an interactive model.
- Gain performance from directly calls to MPI functions.
- Reduce cost of compression in communication.
- Ease programming and debugging efforts.
---
#### Examples
- A quick example for pbdMPI
with possible extension to
Rmpi
can be found at Example.
- More examples for statistical computing can be found at
Cookbook.
---