We propose to provide the basic knowledge for installation and usage of MPI on the INFNGrid according to gLite update 14. This update complies with the guidelines provided by the Egee's MPI Working Group.
Examples are based on openmpi in Torque/Maui environment.
Most important news are:
In this section we want to show the installation and configuration steps. The following operations must be executed on all Worker Nodes.
We assume that the middleware gLite version 3.1 update 14/15/16 or higher is installed and configured properly.
If it is not already installed, install glite-MPI_utils package
yum install glite-MPI_utils
MPI packages available:
Installing from Yum repo. The binaries are installed in /usr/bin/
yum install yum-conf-epel yum install mpich2 mpich2-devel --enablerepo=epel
The following command is an example of how to recompile the source rpm downloaded from openmpi:
rpmbuild -bb --define 'install_in_opt 1' --define 'mflags -j2' --define 'build_all_in_one_rpm 1' --define 'enable-mpi-threads 1' --define 'cflags -g' --define 'use_mpi_selector 1' --define 'shell_scripts_basename mpivars' --define 'install_shell_scripts 1' /usr/src/redhat/SPECS/openmpi-1.4.1.spec
--define 'install_in_opt 1' -> install in /opt --define 'mflags -jN' -> speedup the compilation process - N is the number of cores --define 'build_all_in_one_rpm 1' -> build a single package --define 'enable-mpi-threads 1' -> enable thread support --define 'use_mpi_selector 1' --define 'shell_scripts_basename mpivars' --define 'install_shell_scripts 1' -> mpi-selector support --define 'configure_options --with-openib=/usr --with-openib-libdir=/usr/lib64' -> infiniband support --define 'configure_options F77=pgf77 FC=pgf90 CXX=pgCC F77=pgf77 FFLAGS=-fastsse FCFLAGS=-fastsse' -> Portland Compilers --define 'cflags -g'
The torque-devel package must be installed in order to enable openmpi/torque integration (machinefile and processor number autodiscovery). This package is not included in official repo.
Special directives should be customized in my-site-info.def (or in mpi-related files glite-mpi, glite-mpi_ce, glite-mpi_wn) as reported in the example files in /opt/glite/yaim/examples/siteinfo/services/
For example, to install openmpi 1.3.3 and mpich2 1.1.1 with shared-home, the configuration settings are
MPI_MPICH_ENABLE="no" MPI_MPICH2_ENABLE="yes" MPI_OPENMPI_ENABLE="yes" MPI_LAM_ENABLE="no" MPI_OPENMPI_PATH="/opt/openmpi/1.4/" MPI_OPENMPI_VERSION="1.4" MPI_MPICH2_PATH="/usr/" MPI_MPICH2_VERSION="1.1.1" MPI_SHARED_HOME="yes" MPI_SSH_HOST_BASED_AUTH="yes" MPI_OPENMPI_MPIEXEC="/opt/openmpi/1.4/bin/mpiexec" MPI_MPICH2_MPIEXEC="/usr/bin/mpiexec"
/opt/glite/yaim/bin/ig_yaim -c -s my-site-info.def -n ig_WN_torque_noafs
Among other things, this command generates a dummy mpirun script (it only executes the argument without distributing it to the nodes). The script /opt/glite/bin/mpirun is dynamically generated by YAIM function config_mpi_wn.
Hence it is compelling that /opt/glite/bin/ precedes other possible MPI paths in the PATH environment variable. This is because at the execution of a MPI JobType the middleware executes the mpirun command (the first in path), while in order to use mpi-start we need mpirun to be called in a wrapper script.
The script /opt/glite/bin/mpirun should be deleted from the WNs.
To have a more flexible environment, execution of MPI Jobs requires a wrapper script.
#!/bin/bash
#
# Pull in the arguments.
MY_EXECUTABLE=`pwd`/$1
MPI_FLAVOR=$2
# Convert flavor to lowercase in order to pass it to mpi-start.
MPI_FLAVOR_LOWER=`echo $MPI_FLAVOR | tr '[:upper:]' '[:lower:]'`
# Pull out the correct paths for the requested flavor.
eval MPI_PATH=`printenv MPI_${MPI_FLAVOR}_PATH`
# Ensure the prefix is correctly set. Don't rely on the defaults.
eval I2G_${MPI_FLAVOR}_PREFIX=$MPI_PATH
export I2G_${MPI_FLAVOR}_PREFIX
# Touch the executable. It must exist for the shared file system check.
# If it does not, then mpi-start may try to distribute the executable
# (while it shouldn't do that).
touch $MY_EXECUTABLE
# Setup for mpi-start.
export I2G_MPI_APPLICATION=$MY_EXECUTABLE
export I2G_MPI_APPLICATION_ARGS=
export I2G_MPI_TYPE=$MPI_FLAVOR_LOWER
export I2G_MPI_PRE_RUN_HOOK=mpi-hooks.sh
export I2G_MPI_POST_RUN_HOOK=mpi-hooks.sh
# If these are set then you will get more debugging information.
export I2G_MPI_START_VERBOSE=1
#export I2G_MPI_START_DEBUG=1
# Invoke mpi-start.
$I2G_MPI_START
The script defines environment variables for the selected flavour requested by mpi-start. Moreover it defines the selected hook scripts. Eventually mpi-start is executed.
One can modify this script in order to meet specific user requirements. For example, to use this script on a cluster without mpi-start the following lines should be added at the beginning of the script:
if [ "x$I2G_MPI_START" = "x" ]; then
# untar mpi-start and set up variables
tar xzf mpi-start-*.tar.gz
export I2G_MPI_START=bin/mpi-start
MPIRUN=`which mpirun`
export MPI_MPICH_PATH=`dirname $MPIRUN`
fi
This is a hook example. This script performs pre and post run operations. For example, it can compile the program or copy the final data to a storage element, and it must be a separate file from the previous wrapper script
#!/bin/sh
# This function will be called before the execution of MPI executable.
# You can, for example, compile the executable itself.
#
pre_run_hook () {
# Compile the program.
echo "Compiling ${I2G_MPI_APPLICATION}"
# Actually compile the program.
cmd="mpicc ${MPI_MPICC_OPTS} -o ${I2G_MPI_APPLICATION} ${I2G_MPI_APPLICATION}.c"
echo $cmd
$cmd
if [ ! $? -eq 0 ]; then
echo "Error compiling program. Exiting..."
exit 1
fi
# Everything's OK.
echo "Successfully compiled ${I2G_MPI_APPLICATION}"
return 0
}
# This function will be called after the execution of MPI executable.
# A typical case for this is to upload the results to a storage element.
post_run_hook () {
echo "Executing post hook."
echo "Finished the post hook."
return 0
}
The functions pre_run_hook and post_run_hook must be defined, even in different files, at the time of mpi-start execution.
The same procedure works even with mpich.
Defining the job is not significantly different from a standard definition.
mpi-start-wrapper.jdl
JobType = "Normal";
#before GliteWMS update: JobType = "MPICH";
CPUnumber = 8 ;
#before GliteWMS update: NodeNumber = 8;
Executable = "mpi-start-wrapper.sh";
Arguments = "mpi-test OPENMPI";
StdOutput = "mpi-test.out";
StdError = "mpi-test.err";
InputSandbox = {"mpi-start-wrapper.sh","mpi-hooks.sh","mpi-test.c"};
OutputSandbox = {"mpi-test.err","mpi-test.out"};
Requirements =
Member("MPI-START", other.GlueHostApplicationSoftwareRunTimeEnvironment)
&& Member("OPENMPI", other.GlueHostApplicationSoftwareRunTimeEnvironment)
;
JobType must be MPICH even if the chosen flavour is different. But with the new GliteWMS Normal should be used
CPUNumber must be defined and represents the number of requested CPUs.
Arguments must contain the name of the program and the selected MPI flavour.
Mpi-start can easily used for local execution:
#!/bin/bash
#
# Pull in the arguments.
WORK_DIR=$WD
MY_EXECUTABLE=$WORK_DIR/$EXE
MPI_FLAVOR=$FLAVOR
# Convert flavor to lowercase in order to pass it to mpi-start.
MPI_FLAVOR_LOWER=`echo $MPI_FLAVOR | tr '[:upper:]' '[:lower:]'`
# Pull out the correct paths for the requested flavor.
eval MPI_PATH=`printenv MPI_${MPI_FLAVOR}_PATH`
# Ensure the prefix is correctly set. Don't rely on the defaults.
eval I2G_${MPI_FLAVOR}_PREFIX=$MPI_PATH
export I2G_${MPI_FLAVOR}_PREFIX
# Touch the executable. It must exist for the shared file system check.
# If it does not, then mpi-start may try to distribute the executable
# (while it shouldn't do that).
touch $MY_EXECUTABLE
# Setup for mpi-start.
export I2G_MPI_APPLICATION=$MY_EXECUTABLE
export I2G_MPI_APPLICATION_ARGS=
export I2G_MPI_TYPE=$MPI_FLAVOR_LOWER
export I2G_MPI_PRE_RUN_HOOK=$WORK_DIR/mpi-hooks.sh
export I2G_MPI_POST_RUN_HOOK=$WORK_DIR/mpi-hooks.sh
# If these are set then you will get more debugging information.
export I2G_MPI_START_VERBOSE=1
#export I2G_MPI_START_DEBUG=1
# Invoke mpi-start.
$I2G_MPI_START
the execution command is
qsub -l nodes=4 -q albert -v EXE=cpi_mpi,FLAVOR=OPENMPI,WD=$PWD mpi-start-wrapper-torque.sh
Since passing arguments to the wrapper script isn't possible, we have to export environment variable using -v option. Moreover you have to export the current working directory because the files are not in the globus home directory as they are when we use remote access through glite. If PWD variable is not set
`pwd`
can be used, or it can be written manually.
The same procedure works even with mpich.
The version of Mpi-start deployed by gLite3.1 is i2g-mpi-start-0.0.52-1. This package has an issue concerning openmpi: there is a syntax error in /opt/i2g/etc/mpi-start/openmpi.mpi (MPI_SPECIFIC_PARAMS+=”..”). This bug can be easily fixed using a newer version of mpi-start. At the moment the latest release is 0.0.58:
rpm -Uvh http://grid-it.cnaf.infn.it/mrepo/ig_sl4-x86_64/RPMS.all/i2g-mpi-start-0.0.58-1.noarch.rpm
Unfortunately both mpi-start-0.0.58 and mpi-start-0.0.52 have problems when using LSF in combination with mpiexec. File distribution and job execution don't work if mpiexec is installed under LSF, because mpiexec is called even if there is no PBS environment.
We have developed a patch to fix the bug reported above and some smaller problems. To apply this patch:
cd $(dirname $I2G_MPI_START)/.. && wget http://www.fis.unipr.it/grid/wiki_files/mpi-start-0.0.58-fix.patch && patch -p0 < mpi-start-0.0.58-fix.patch
This patch has been approved by mpi-start developers and will be included in mpi-start-0.0.60.
Roberto Alfieri - Enrico Tagliavini — 2009/05/22