This page demonstrate a parallel computing tool implemented in R. You have to have a MPI environment and a Rmpi package to run this example. It also show the comparison with other methods. In general, not all computing problem can be easily parallelized such as MCMC, and not all parallel computing is more efficient than series computing. Parallel computing is also highly dependent on programing and algorithm.
- MPI -- Message Passing Interface
With LAM/MPI and R package Rmpi by Dr. Hao Yu.
- Download
For Mandrake Linux system:
- Master & Salve
The basic idea is Master for MPI rank 0 and Slave for MPI rank from 1 to n, where n is the universal size in your MPI enviroments.
Step: 0. Initial.
1. Master send to Slave. (bcast, send)
2. Slave receive from Master. (bcast, recv)
3. compute.
4. Slave send to Master. (send)
5. Master receive from Slave. (recv)
6. complete and quit.Here, create a file "rmpi_ms.r" as follows,
# File name: rmpi_ms.r call.mpi.master <- function(){ library(Rmpi) mpi.spawn.Rslaves(needlog = FALSE) mpi.bcast.Robj2slave(call.mpi.slave) mpi.bcast.cmd(call.mpi.slave()) x <- 100 mpi.bcast(as.integer(x), type = 1) mysize <- mpi.universe.size() y <- 200 for(i in 1 : mysize){ mpi.send(as.integer(y), type = 1, dest = i, tag = 1) } ret <- NULL for(i in 1 : mysize){ ret.slave <- mpi.recv.Robj(source = i, tag = 2) ret <- rbind(ret, ret.slave) } ret } call.mpi.slave <- function(){ x <- mpi.bcast(integer(1), type = 1) y <- mpi.recv(integer(1), type = 1, source = 0, tag = 1) myrank <- mpi.comm.rank() ret.slave <- c(myrank, x, y, myrank, x * myrank + y) mpi.send.Robj(ret.slave, dest = 0, tag = 2) } call.mpi.master()
The output will like this,
[,1] [,2] [,3] [,4] [,5] ret.slave 1 100 200 1 300 ret.slave 2 100 200 2 400 ret.slave 3 100 200 3 500 ret.slave 4 100 200 4 600
- Sum by Rmpi
As the Loop page, use MPI to split the for loop and send to another slave to compute, reduce the computing time. Here are examples "rmpi_for_1.r", "rmpi_for_2.r", "rmpi_apply.r" and "rmpi_rowSums.r".
- Computing time
For PIII-1.4G PC cluster with 4 nodes, the test computing time as follows,
Sum by
Rmpifor 1 for 2 apply rowSums
Time
(secs)86 79 41 5
Sum by
Loopfor 1 for 2 apply rowSums dyn Time
(secs)331 307 117 2 19
- Conclusion
The conclusion is the same in Loop page.
Use MPI to split independent job.
Reduce by the number of CPUs?
Communication times?