This page demonstrate a parallel computing tool implemented in R. You have to have a MPI environment and a Rmpi package to run this example. It also show the comparison with other methods. In general, not all computing problem can be easily parallelized such as MCMC, and not all parallel computing is more efficient than series computing. Parallel computing is also highly dependent on programing and algorithm.

MPI -- Message Passing Interface
With LAM/MPI and R package Rmpi by Dr. Hao Yu.

Download
For Mandrake Linux system:
- For LAM/MPI, it requires
  - lam-devel-6.5.9-2mdk (my mirror here),
  - lam-runtime-6.5.9-2mdk (my mirror here),
  - lam-doc-6.5.9-2mdk (my mirror here),
  - liblam0-devel-6.5.9-2mdk (my mirror here).
- For Rmpi, it requires Rmpi_0.4-6.tar.gz (my mirror here).

Master & Salve

The basic idea is Master for MPI rank 0 and Slave for MPI rank from 1 to n, where n is the universal size in your MPI enviroments.

Step:	0. Initial. 1. Master send to Slave. (bcast, send) 2. Slave receive from Master. (bcast, recv) 3. compute. 4. Slave send to Master. (send) 5. Master receive from Slave. (recv) 6. complete and quit.

Here, create a file "rmpi_ms.r" as follows,

# File name: rmpi_ms.r

call.mpi.master <- function(){
  library(Rmpi)
  mpi.spawn.Rslaves(needlog = FALSE)
  mpi.bcast.Robj2slave(call.mpi.slave)
  mpi.bcast.cmd(call.mpi.slave())

  x <- 100
  mpi.bcast(as.integer(x), type = 1)

  mysize <- mpi.universe.size()
  y <- 200
  for(i in 1 : mysize){
    mpi.send(as.integer(y), type = 1, dest = i, tag = 1)
  }

  ret <- NULL
  for(i in 1 : mysize){
    ret.slave <- mpi.recv.Robj(source = i, tag = 2)
    ret <- rbind(ret, ret.slave)
  }
  ret
}

call.mpi.slave <- function(){
  x <- mpi.bcast(integer(1), type = 1)
  y <- mpi.recv(integer(1), type = 1, source = 0, tag = 1)

  myrank <- mpi.comm.rank()

  ret.slave <- c(myrank, x, y, myrank, x * myrank + y)
  mpi.send.Robj(ret.slave, dest = 0, tag = 2)
}

call.mpi.master()

The output will like this,

          [,1] [,2] [,3] [,4] [,5]
ret.slave    1  100  200    1  300
ret.slave    2  100  200    2  400
ret.slave    3  100  200    3  500
ret.slave    4  100  200    4  600

Sum by Rmpi
As the Loop page, use MPI to split the for loop and send to another slave to compute, reduce the computing time. Here are examples "rmpi_for_1.r", "rmpi_for_2.r", "rmpi_apply.r" and "rmpi_rowSums.r".

Computing time
For PIII-1.4G PC cluster with 4 nodes, the test computing time as follows,

Sum by
Rmpi for 1 for 2 apply rowSums

Time
(secs) 86 79 41 5

Sum by
Loop for 1 for 2 apply rowSums dyn
Time
(secs) 331 307 117 2 19

Sum by Rmpi	for 1	for 2	apply	rowSums
Time (secs)	86	79	41	5
Sum by Loop	for 1	for 2	apply	rowSums	dyn
Time (secs)	331	307	117	2	19

Conclusion
The conclusion is the same in Loop page.
Use MPI to split independent job.
Reduce by the number of CPUs?
Communication times?