Amazon Elastic Compute Cloud

This is a summary to steup a cluster on AWS EC2. In short, the key points are


Very Long Steps:

  1. For example, I have two instances of t2.micro running on us-west-2a. Both instances run Ubuntu image provided by AWS EC2. They have public IPs 54.68.48.164 and 54.68.36.71, while they also have private DNSs ip-172-31-42-235 and ip-172-31-40-207, respectively. All instances are in a default security group. The next figures show what they should be.


  2. All instances are changed in default security groups (under NETWORK & SECURITY on the left hand side menu) where I set both Inbund and Outbund rules to

    • Type: All traffic
    • Protocal: All
    • Port Range: 0 - 65536
    • Source/Destination: Anywhere
      This is a risky change. It is suggested to allow private IPs only. Please consulting with Network Professional.
      This is make sure MPI can access among instances and via whatever ports are needed. The next figures show what they should be.

    • Disable all other network interfaces. Keep routing path as simple as possible. e.g. One may do below in Ubuntu/Xubuntu

      sudo ifconfig enp0s3 down
      sudo ifconfig docker0 down

      where enp0s3 and docker0 are the network interface names.

    • Disable firewalls if possible. e.g. One may do below in Ubuntu/Xubuntu
      SHELL> sudo ufw disable
  3. All instances should have files

    • ~/.ssh/authorized_keys
    • ~/.ssh/id_rsa
    • ~/.ssh/known_hosts that allows both instances can ssh and login with each other without typing further user name and password. Note that MPI is in a batch mode which does not allowing any interruption. The next figure shows what it should be.

      Note that known_hosts can be generated via
      SHELL> ssh ip-172-31-42-235
      SHELL> ssh ip-172-31-40-207
      Add more instances if you have more. Note that all instances should have access to all other instances. You may need some shell scripts to help with this setup. Or, you may change /etc/ssh/ssh_config to avoid checking and this step.

  4. (Optional) All instances should have the file ~/work-my/00_set_devel_R containing

    export MAKE="/usr/bin/make -j 4"
    export R_DEVEL=/home/ubuntu/work-my/local/R-devel
    export OMPI=/home/ubuntu/work-my/local/ompi
    
    export PATH=$R_DEVEL/bin:$OMPI/bin:$PATH
    export LD_LIBRARY_PATH=$R_DEVEL/lib/R/lib:$OMPI/lib:$LD_LIBRARY_PATH
    
    alias mpiexec=/home/ubuntu/work-my/local/ompi/bin/mpiexec
    alias mpirun=/home/ubuntu/work-my/local/ompi/bin/mpirun
    alias Rscript=/home/ubuntu/work-my/local/R-devel/bin/Rscript

    You may adjust the path according to the system. This is to set the executable and library paths to R and OpenMPI. The alias is to avoid typing full path to mpiexec or mpirun.

  5. (Optional) All instances should have the following line in the file ~/.bashrc

    . /home/ubuntu/work-my/00_set_devel_R

    This is to make sure the working environment is consistent on every instance after ssh login.

  6. All instances should install OpenMPI, R, and pbdMPI and their dependences as next.

    SHELL> sudo apt-get update
    SHELL> sudo apt-get install libopenmpi-dev openmpi-bin
    SHELL> sudo apt-get install r-base
    SHELL> sudo R CMD INSTALL rlecuyer_0.3-3.tar.gz
    SHELL> sudo R CMD INSTALL pbdMPI_0.2-5.tar.gz
  7. Login instance, say ip-172-31-42-235 should have the following lines in the file ~/hostfile to let MPI knows which machines are available to launch applications either in SPMD.

    ip-172-31-42-235
    ip-172-31-40-207

    Add more instances if you have more and see OpenMPI website for more examples of this file.

  8. In the login instance, you may test hostname, R, and pbdMPI in 4 processors as next.

    SHELL> mpiexec -hostfile ~/hostfile -np 4 hostname
    SHELL> mpiexec -hostfile ~/hostfile -np 4 \
           Rscript -e "Sys.info()['nodename']" 
    SHELL> mpiexec -hostfile ~/hostfile -np 4 \
           Rscript -e "library(pbdMPI,quietly=T);init();comm.rank();finalize()"

    Note: Full paths to the mpiexec and Rscript may be needed.

    When everything is right, the outputs should be as next if all setups are correct.