Build Multiple Nodes for pbdMPI
This section demonstrate how to install OpenMPI and pbdMPI on multiple nodes and form a cluster to run SPMD codes across nodes. I use VM to build the first template machine (vb1) and clone it to second machine (vb2). With a few modification on vb2 to avoid conflicts with vb1, I can have the same account, local file, environment on both machines at the same, and can login/ssh from and to both machines without password. Then, I can utilize two machines freely to perform SPMD computing from vb1 along. Unlike AWS EC2, I only do a minimun requirement manually for this task.
See Install VirtualBox to learn how to install and create a VM.
Download the multiple_nodes image (2.7GB) which contains two machines vb1 and vb2. Import this image into the VirtualBox as the same way in Install pbdR Image
This image contains
- Xubuntu 14.04 without firewall
- vb1 at 192.168.1.1 and vb2 at 192.168.1.2
- ssh, NFS, git, r-base from Ubuntu default
- local built OpenMPI-1.8.4, pbdMPI 2.6
The detail steps are in the file multiple_nodes.txt.
I test the SPMD code and it works by using two machines with a collective call. In the same way, one can (linked) clone vb1 to other machines to form a larger cluster easily. In the example, I rebuild new OpenMPI and install R packages locally (/home/pbdr/work-my/local/R_libs) which is shared by all nodes.
One may also use Ubuntu's default packages, "openmpi-bin" and "libopenmpi-dev",
to run with pbdMPI. However, it could have network routing problem if
eth0
is for NAT/host and eth1
is for internal MPI
communication. It will be easy to bring eth0
down by `sudo ip link set eth0 down
' on all machines.
Potential extension including install NIS and rsh, drop ssh, and install all other pbdR packages. Also, simplification can be done by moving /etc and /home to external disk that can reduce management.