A Note for building OpenMPI cluster with multiple nodes

This file contains personal experiments that install two VMs, OpenMPI, and
pbdMPI for two nodes (vb1, vb2) sharing same account (pbdr) and local file
(/home) via NFS. The linux system, software, network setup, and configuration
are all in next. Two nodes form a cluster with minimum requirements. The
cluster is tested with SPMD code (hello.r) with collective call via pbdMI
across two nodes.

Both nodes are firewall disabled, using ssh public key authenticity,
communicating with second network adapter (eth1) in 192.168.1.* range. The
account 'pbdr' can freely login/ssh without problems between two nodes. By
cloning from the vb1, potentially I can build vb3, vb4, ... with identical
user environment.

In the example, I rebuild new OpenMPI and install R packages locally
(/home/pbdr/work-my/local/R_libs) which is shared by all nodes.

One may also use Ubuntu's default packages, "openmpi-bin" and "libopenmpi-dev",
to run with pbdMPI. However, it could have network routing problem if eth0 is
for NAT/host and eth1 is for internal MPI communication. It will be easy to
bring eth0 down by `sudo ip link set eth0 down' on all machines.

Wei-Chen Chen


-----------------------------------------------------------------
1. Install VM

### Install vb1 with two network cards
### i) Network Adapter 1 (eth0) and Attached to "NAT", and
### ii) Network Adapter 2 (eth1) and Attached to "Host-only Adapter".

### Install Xubuntu 10.04
### Set user id/pw: pbdr/pbdr


-----------------------------------------------------------------
2. Change configurations

### Boot vb1

sudo apt-get install ssh r-base git nfs-kernel-server nfs-common
# sudo apt-get --purge remove openmpi-bin libopenmpi-dev
sudo ufw disable

sudo vi /etc/network/interfaces
### Append next to the file
auto eth1
iface eth1 inet static
address 192.168.1.1
netmask 255.255.255.0
geteway 192.168.1.1
broadcast 192.168.1.255
### End and save the file

sudo vi /etc/hosts
### Append next to the file
192.168.1.1	vb1
192.168.1.2	vb2
### End and save the file

sudo vi /etc/exports
### Append next to the file
/home 192.168.1.0/24(rw,sync,no_root_squash,no_subtree_check)
### End and save the file
sudo exportfs -a
sudo /etc/init.d/nfs-kernel-server restart

ssh-keygen -t rsa -f /home/pbdr/.ssh/id_rsa
cat /home/pbdr/.ssh/id_rsa.pub >> /home/pbdr/.ssh/authorized_keys


-----------------------------------------------------------------
3. Build OpenMPI and pbdMPI

### Login to vb1

cd /home/pbdr
mkdir work-my
mkdir work-my/source
mkdir work-my/local
mkdir work-my/local/R_libs
cd /home/pbdr/work-my

vi 00-set_path.sh
### Append next to the file
export R_LIBS_USER=/home/pbdr/work-my/local/R_libs
export PATH=/home/pbdr/work-my/local/openmpi-1.8.4/bin:$PATH
export LD_LIBRARY_PATH=/home/pbdr/work-my/local/openmpi-1.8.4/lib
### End and save the file

source 00-set_path.sh

vi /home/pbdr/.bashrc
### Append next to the file
source /home/pbdr/work-my/00-set_path.sh
### End and save the file

cd /home/pbdr/work-my/source
wget http://www.open-mpi.org/software/ompi/v1.8/downloads/openmpi-1.8.4.tar.gz
wget http://cran.r-project.org/src/contrib/rlecuyer_0.3-3.tar.gz
git clone https://github.com/snoweye/pbdMPI

tar zxvf openmpi-1.8.4.tar.gz
cd /home/pbdr/work-my/source/openmpi-1.8.4
./configure \
  --prefix=/home/pbdr/work-my/local/openmpi-1.8.4 \
  --enable-orterun-prefix-by-default
make -j 4
make install

cd /home/pbdr/work-my/source
R CMD INSTALL rlecuyer_0.3-3.tar.gz
chmod u+x pbdMPI/configure
R CMD INSTALL pbdMPI

### Shutdown vb1


------------------------------------------------------------------------
4. Clone VM

### Linked clone vb1 to vb2 with reinitialize the MAC address of all network
### cards.


-----------------------------------------------------------------
5. Modify new clones

### Boot vb2

sudo vi /etc/hosts
### Replace "127.0.0.1	vb1" by "127.0.0.1	vb2"
### Save the file

sudo vi /etc/hostname
### Replace "vb1" by "vb2"
### Save the file

sudo vi /etc/network/interfaces
### Replace "address 192.168.1.1" by "address 192.168.1.2"
### Save the file

sudo vi /etc/fstab
### Append next to the file
192.168.1.1:/home /home nfs rw,hard,intr 0 0
### End and save the file

### Shutdown vb2


-----------------------------------------------------------------
6. Test all VMs

### Sequentially boot vb1 then boot vb2 

### Login in to vb2

df
### This should show /home on vb2 is mounted from 192.168.1.1:/home
ssh vb1
ssh vb2
### This should no need password in both directions


-----------------------------------------------------------------
7. Test OpenMPI and pbdMPI

### Login in to vb1

env | grep ^PATH
env | grep LD_LIBRARY_PATH
env | grep R_LIBS_USER
### This should point to right path

cd /home/pbdr/work-my 
vi hostfile
### Add next to the file
192.168.1.1
192.168.1.2
### End and save the file

vi hello.r
### Add next to the file
library(pbdMPI, quietly = TRUE)
init()
x.gbd <- 1:10
x <- allreduce(x.gbd)
comm.cat(Sys.info()["nodename"], "\n", all.rank = TRUE)
comm.print(x, all.rank = TRUE)
finalize()
### End and save the file

mpiexec -np 2 Rscript hello.r
### Test locally

mpiexec -x R_LIBS_USER --hostfile hostfile -np 2 Rscript hello.r
### Since this mpiexec did neither pass R_LIBRS_USER nor get from environment
### correctly, I have to pass it with -x manually. PATH and LD_LIBRARY_PATH
### are fine.