Build Multiple Nodes for pbdMPI

This section demonstrate how to install OpenMPI and pbdMPI on multiple nodes and form a cluster to run SPMD codes across nodes. I use VM to build the first template machine (vb1) and clone it to second machine (vb2). With a few modification on vb2 to avoid conflicts with vb1, I can have the same account, local file, environment on both machines at the same, and can login/ssh from and to both machines without password. Then, I can utilize two machines freely to perform SPMD computing from vb1 along. Unlike AWS EC2, I only do a minimun requirement manually for this task.

  1. See Install VirtualBox to learn how to install and create a VM.

  2. Download the multiple_nodes image (2.7GB) which contains two machines vb1 and vb2. Import this image into the VirtualBox as the same way in Install pbdR Image

  3. This image contains

    • Xubuntu 14.04 without firewall
    • vb1 at 192.168.1.1 and vb2 at 192.168.1.2
    • ssh, NFS, git, r-base from Ubuntu default
    • local built OpenMPI-1.8.4, pbdMPI 2.6
  4. The detail steps are in the file multiple_nodes.txt.

I test the SPMD code and it works by using two machines with a collective call. In the same way, one can (linked) clone vb1 to other machines to form a larger cluster easily. In the example, I rebuild new OpenMPI and install R packages locally (/home/pbdr/work-my/local/R_libs) which is shared by all nodes.

One may also use Ubuntu's default packages, "openmpi-bin" and "libopenmpi-dev", to run with pbdMPI. However, it could have network routing problem if eth0 is for NAT/host and eth1 is for internal MPI communication. It will be easy to bring eth0 down by `sudo ip link set eth0 down' on all machines.

Potential extension including install NIS and rsh, drop ssh, and install all other pbdR packages. Also, simplification can be done by moving /etc and /home to external disk that can reduce management.