Strumenti Utente

Strumenti Sito


grid:mpi-report

MPI jobs on the INFN GRID - Introduction

We propose to provide the basic knowledge for installation and usage of MPI on the INFNGrid according to gLite update 14. This update complies with the guidelines provided by the Egee's MPI Working Group.

Examples are based on openmpi in Torque/Maui environment.


Most important news are:

  • Introduction of mpi-start with:
    • Support for and installation of multiple flavours/versions of MPI: mpich (supported by i2g-mpi-start-0.0.58 or higher) , openmpi, lam
    • Introduction of pre and post execution scripts (for example to enable remote compilation)
    • Automatic home management: if home directory is not shared data and programs will be automatically replicated
    • Automatic machine-file management
  • Introduction of a “dummy” mpirun (not needed anymore starting with GliteWMS 3.1.12 update) replacing other implementations of mpirun. This allows to specify a wrapper script as executable in the JDL file, in place of the standard mpi executable.

Installation and configuration

In this section we want to show the installation and configuration steps. The following operations must be executed on all Worker Nodes.

We assume that the middleware gLite version 3.1 update 14/15/16 or higher is installed and configured properly.

Installing glite-MPI_utils

If it is not already installed, install glite-MPI_utils package

yum install glite-MPI_utils

Installing mpi

MPI packages available:

  • mpich (currently version 1.2.7) is included in the glite-UI_sl4_externals repository, with binaries installed in /opt/mpich-1.2.7p1/bin/

Installing from Yum repo. The binaries are installed in /usr/bin/

 yum install yum-conf-epel
 yum install mpich2 mpich2-devel --enablerepo=epel
  • openmpi (currently version 1.4). In the pre-compiled version of this packege the binaries are installed in /usr/bin/. If we want to change the installation path we must recompile the package using a different installation prefix.

The following command is an example of how to recompile the source rpm downloaded from openmpi:

rpmbuild -bb --define 'install_in_opt 1'  --define 'mflags -j2' --define 'build_all_in_one_rpm 1' --define 'enable-mpi-threads 1'  --define 'cflags -g'  --define 'use_mpi_selector 1'  --define 'shell_scripts_basename mpivars' --define 'install_shell_scripts 1' /usr/src/redhat/SPECS/openmpi-1.4.1.spec

Useful options:

 
--define 'install_in_opt 1'   -> install in /opt 
--define 'mflags -jN'   -> speedup the compilation process - N is the number of cores
--define 'build_all_in_one_rpm 1'  -> build a single package 
--define 'enable-mpi-threads 1' ->  enable thread support
--define 'use_mpi_selector 1'  --define 'shell_scripts_basename mpivars' --define 'install_shell_scripts 1'   ->  mpi-selector support 
--define 'configure_options --with-openib=/usr --with-openib-libdir=/usr/lib64'  -> infiniband support
--define 'configure_options F77=pgf77 FC=pgf90  CXX=pgCC F77=pgf77  FFLAGS=-fastsse FCFLAGS=-fastsse'  -> Portland Compilers
--define 'cflags -g' 

The torque-devel package must be installed in order to enable openmpi/torque integration (machinefile and processor number autodiscovery). This package is not included in official repo.

Recompile with Portland Libraries

Yaim configuration

Special directives should be customized in my-site-info.def (or in mpi-related files glite-mpi, glite-mpi_ce, glite-mpi_wn) as reported in the example files in /opt/glite/yaim/examples/siteinfo/services/

For example, to install openmpi 1.3.3 and mpich2 1.1.1 with shared-home, the configuration settings are

glite-mpi

MPI_MPICH_ENABLE="no"
MPI_MPICH2_ENABLE="yes"
MPI_OPENMPI_ENABLE="yes"
MPI_LAM_ENABLE="no"

MPI_OPENMPI_PATH="/opt/openmpi/1.4/" 
MPI_OPENMPI_VERSION="1.4"
MPI_MPICH2_PATH="/usr/"
MPI_MPICH2_VERSION="1.1.1"

MPI_SHARED_HOME="yes"
MPI_SSH_HOST_BASED_AUTH="yes"

MPI_OPENMPI_MPIEXEC="/opt/openmpi/1.4/bin/mpiexec"
MPI_MPICH2_MPIEXEC="/usr/bin/mpiexec"

Invoke Yaim

/opt/glite/yaim/bin/ig_yaim -c -s my-site-info.def -n ig_WN_torque_noafs 

Among other things, this command generates a dummy mpirun script (it only executes the argument without distributing it to the nodes). The script /opt/glite/bin/mpirun is dynamically generated by YAIM function config_mpi_wn.
Hence it is compelling that /opt/glite/bin/ precedes other possible MPI paths in the PATH environment variable. This is because at the execution of a MPI JobType the middleware executes the mpirun command (the first in path), while in order to use mpi-start we need mpirun to be called in a wrapper script.

The script /opt/glite/bin/mpirun should be deleted from the WNs.

Running Jobs

Wrapper Script for mpi-start

To have a more flexible environment, execution of MPI Jobs requires a wrapper script.

mpi-start-wrapper

#!/bin/bash
#
# Pull in the arguments.
MY_EXECUTABLE=`pwd`/$1
MPI_FLAVOR=$2

# Convert flavor to lowercase in order to pass it to mpi-start.
MPI_FLAVOR_LOWER=`echo $MPI_FLAVOR | tr '[:upper:]' '[:lower:]'`

# Pull out the correct paths for the requested flavor.
eval MPI_PATH=`printenv MPI_${MPI_FLAVOR}_PATH`

# Ensure the prefix is correctly set.  Don't rely on the defaults.
eval I2G_${MPI_FLAVOR}_PREFIX=$MPI_PATH
export I2G_${MPI_FLAVOR}_PREFIX

# Touch the executable.  It must exist for the shared file system check.
# If it does not, then mpi-start may try to distribute the executable
# (while it shouldn't do that).
touch $MY_EXECUTABLE

# Setup for mpi-start.
export I2G_MPI_APPLICATION=$MY_EXECUTABLE
export I2G_MPI_APPLICATION_ARGS=
export I2G_MPI_TYPE=$MPI_FLAVOR_LOWER
export I2G_MPI_PRE_RUN_HOOK=mpi-hooks.sh
export I2G_MPI_POST_RUN_HOOK=mpi-hooks.sh

# If these are set then you will get more debugging information.
export I2G_MPI_START_VERBOSE=1
#export I2G_MPI_START_DEBUG=1

# Invoke mpi-start.
$I2G_MPI_START

The script defines environment variables for the selected flavour requested by mpi-start. Moreover it defines the selected hook scripts. Eventually mpi-start is executed.

One can modify this script in order to meet specific user requirements. For example, to use this script on a cluster without mpi-start the following lines should be added at the beginning of the script:

if [ "x$I2G_MPI_START" = "x" ]; then
    # untar mpi-start and set up variables
    tar xzf mpi-start-*.tar.gz
    export I2G_MPI_START=bin/mpi-start
    MPIRUN=`which mpirun`
    export MPI_MPICH_PATH=`dirname $MPIRUN`
fi

mpi-start hooks

This is a hook example. This script performs pre and post run operations. For example, it can compile the program or copy the final data to a storage element, and it must be a separate file from the previous wrapper script

mpi-hooks

#!/bin/sh

# This function will be called before the execution of MPI executable.
# You can, for example, compile the executable itself.
#
pre_run_hook () {

  # Compile the program.
  echo "Compiling ${I2G_MPI_APPLICATION}"

  # Actually compile the program.
  cmd="mpicc ${MPI_MPICC_OPTS} -o ${I2G_MPI_APPLICATION} ${I2G_MPI_APPLICATION}.c"
  echo $cmd
  $cmd
  if [ ! $? -eq 0 ]; then
    echo "Error compiling program.  Exiting..."
    exit 1
  fi

  # Everything's OK.
  echo "Successfully compiled ${I2G_MPI_APPLICATION}"

  return 0
}

# This function will be called after  the execution of MPI executable.
# A typical case for this is to upload the results to a storage element.
post_run_hook () {
  echo "Executing post hook."
  echo "Finished the post hook."

  return 0
}

The functions pre_run_hook and post_run_hook must be defined, even in different files, at the time of mpi-start execution.

The same procedure works even with mpich.

Create the jdl file

Defining the job is not significantly different from a standard definition.

mpi-start-wrapper.jdl

JobType = "Normal";
#before GliteWMS update: JobType = "MPICH";
CPUnumber = 8 ;
#before GliteWMS update:  NodeNumber = 8;
Executable = "mpi-start-wrapper.sh";
Arguments = "mpi-test OPENMPI";
StdOutput = "mpi-test.out";
StdError = "mpi-test.err";
InputSandbox = {"mpi-start-wrapper.sh","mpi-hooks.sh","mpi-test.c"};
OutputSandbox = {"mpi-test.err","mpi-test.out"};
Requirements =
  Member("MPI-START", other.GlueHostApplicationSoftwareRunTimeEnvironment)
  && Member("OPENMPI", other.GlueHostApplicationSoftwareRunTimeEnvironment)
  ;

JobType must be MPICH even if the chosen flavour is different. But with the new GliteWMS Normal should be used
CPUNumber must be defined and represents the number of requested CPUs.
Arguments must contain the name of the program and the selected MPI flavour.

External links

Local execution

Mpi-start can easily used for local execution:

mpi-start-wrapper-torque

#!/bin/bash
#
# Pull in the arguments.
WORK_DIR=$WD
MY_EXECUTABLE=$WORK_DIR/$EXE
MPI_FLAVOR=$FLAVOR


# Convert flavor to lowercase in order to pass it to mpi-start.
MPI_FLAVOR_LOWER=`echo $MPI_FLAVOR | tr '[:upper:]' '[:lower:]'`

# Pull out the correct paths for the requested flavor.
eval MPI_PATH=`printenv MPI_${MPI_FLAVOR}_PATH`

# Ensure the prefix is correctly set.  Don't rely on the defaults.
eval I2G_${MPI_FLAVOR}_PREFIX=$MPI_PATH
export I2G_${MPI_FLAVOR}_PREFIX

# Touch the executable.  It must exist for the shared file system check.
# If it does not, then mpi-start may try to distribute the executable
# (while it shouldn't do that).
touch $MY_EXECUTABLE

# Setup for mpi-start.
export I2G_MPI_APPLICATION=$MY_EXECUTABLE
export I2G_MPI_APPLICATION_ARGS=
export I2G_MPI_TYPE=$MPI_FLAVOR_LOWER
export I2G_MPI_PRE_RUN_HOOK=$WORK_DIR/mpi-hooks.sh
export I2G_MPI_POST_RUN_HOOK=$WORK_DIR/mpi-hooks.sh

# If these are set then you will get more debugging information.
export I2G_MPI_START_VERBOSE=1
#export I2G_MPI_START_DEBUG=1

# Invoke mpi-start.
$I2G_MPI_START

the execution command is

qsub -l nodes=4 -q albert -v EXE=cpi_mpi,FLAVOR=OPENMPI,WD=$PWD mpi-start-wrapper-torque.sh

Since passing arguments to the wrapper script isn't possible, we have to export environment variable using -v option. Moreover you have to export the current working directory because the files are not in the globus home directory as they are when we use remote access through glite. If PWD variable is not set

`pwd`

can be used, or it can be written manually.

The same procedure works even with mpich.

Problems with MPICH

mpi-start

The version of Mpi-start deployed by gLite3.1 is i2g-mpi-start-0.0.52-1. This package has an issue concerning openmpi: there is a syntax error in /opt/i2g/etc/mpi-start/openmpi.mpi (MPI_SPECIFIC_PARAMS+=”..”). This bug can be easily fixed using a newer version of mpi-start. At the moment the latest release is 0.0.58:

rpm -Uvh http://grid-it.cnaf.infn.it/mrepo/ig_sl4-x86_64/RPMS.all/i2g-mpi-start-0.0.58-1.noarch.rpm


Unfortunately both mpi-start-0.0.58 and mpi-start-0.0.52 have problems when using LSF in combination with mpiexec. File distribution and job execution don't work if mpiexec is installed under LSF, because mpiexec is called even if there is no PBS environment.

Patch for mpi-start-0.0.58

We have developed a patch to fix the bug reported above and some smaller problems. To apply this patch:

cd $(dirname $I2G_MPI_START)/.. && wget http://www.fis.unipr.it/grid/wiki_files/mpi-start-0.0.58-fix.patch && patch -p0 < mpi-start-0.0.58-fix.patch

This patch has been approved by mpi-start developers and will be included in mpi-start-0.0.60.




Roberto Alfieri - Enrico Tagliavini — 2009/05/22

/var/www/html/dokuwiki/data/pages/grid/mpi-report.txt · Ultima modifica: Y/m/d H:i da