Strumenti Utente

Strumenti Sito


grid:mpi-theophys

"TheoMpi on the Grid" Project

The project aims at the setting up of a operative Grid infrastructure for parallel jobs for the theoretical physics community of INFN. https://web.infn.it/CSN4/

The researchers are registered into the Theophys Virtual Organization.

The research project are called Iniziative Specifiche.

A common configuration environment will be defined and installed on the parallel resources supporting Theophys.

The purpose of this document is to collect open issues and possible solutions.

WorkerNodes

Software environment
  • SL5.x x86_64
  • openMPI >= 1.3 (1.4 would be better)
  • MPICH2
  • Gnu C, C++, f77 , f95 ??
  • Support for commercial compilers?
  • Scientific libraries:
    • openMP: multithread library
    • HDF5 : Data storing and managing library.
    • Blas : Basic Linear Algebra Subprograms
    • lapack: Linear Algebra PACKage
    • GSL : Gnu Scientific Library
    • GMP: Gnu Multiprecision Library
    • GLPK: Gnu Linear Programming Kit.
    • Fftw3: Fast Fourier Transform Library
Installation example
 yum install -y yum-conf-epel.noarch
 yum install -y hdf5-devel glpk fftw3
 yum install -y libgomp blas-devel gsl-devel gmp-devel
theophys TAG

The following is a possible TAG to be published by theophys compliant sites:

GlueHostApplicationSoftwareRunTimeEnvironment: VO-theophys-gcc41 VO-theompi-gcc41 ??

Cluster

Published TAGs for MPI

Mpi-start is the way to start MPI jobs:

 MPI-START

Al least openMPI should be installed:

 MPI_OPENMPI
 MPI_OPENMPI_VERSION="x.y.z"

Shared home is recommended, but file distribution is supported by MPI-start:

 MPI_SHARED_HOME | MPI_NO_SHARED_HOME  

Remote Start-up of MPI job can be achieved via password-less SSH:

MPI_SSH_HOST_BASED_AUTH

Infiniband is recommended, but Gbit (or 10Gb) Ethernet can be used:

  MPI-Infiniband | MPI-Ethernet 
Open Issues
  • Is it possible to publish the actual number of free CPUs per queue?
  • How is CpuNumber used in the match-making process?
At the moment  CpuNumber is not used at all for match making.
Temporary solution in the JDL:
CPUNumber=n
other.GlueCEInfoTotalCPUs >= CPUNumber

JDL

Typical parallel jdl
JobType = "Normal" ;
CpuNumber = 8 ;
multithread support
SMPGranularity = 8;
WholeNodes = True;

Multithread support is desirable and it should be integrated in the middleware as soon as possible.

Open Issues
  • Is it possible to integrate Granularity/WholeNodes directly in InfnGrid?
CREAM and BLAH: see https://twiki.cern.ch/twiki/bin/view/EGEE/ParameterPassing ??
WMS: included in WMS 3.3

Parallel and sequential jobs

VOMS Roles can be used to limit the access to Parallel queues.

Voms Role = "Parallel"

The Role is assigned by the VO manager and released by VOMS only on explicit request. The user should be informed on the usage scope of the role.

Setup example
site-info.def:
PARALLEL_GROUP_ENABLE="/infngrid/ROLE=parallel"

/opt/glite/yaim/defaults/ig-site.pre:
FQANVOVIEWS=yes

groups.conf:
"/infngrid/ROLE=parallel":::: 

voms-proxy-init -voms infngrid:/infngrid/Role=parallel 
voms-proxy-info -all
>....
>attribute : /infngrid/Role=parallel/Capability=NULL
>attribute : /infngrid/Role=NULL/Capability=NULL 
>...

MPI multi-thread jobs

MPI and multi-thread programs can be combined to exploit the upcoming multicore architectures. The hybrid mutlithread/MPI programming leads to a request of N CPUs with a smaller number of MPI processes (N/thread_num). Actually this programming model is not supported in EGEE. Possible solution: modify the value type of WholeNodes from boolean to integer. Example:

SMPGranularity = 8;
WholeNodes = 4;

This syntax would lead to

qsub -l nodes=4:ppn=GlueHostArchitectureSMPSize

where ppn is a number >=8. WholeNodes value should be passed to mpi-start as the number of MPI processes. Mpi-start should be modified accordingly. Since these new TAGs rely on GlueHostArchitectureSMPSize attribute, subclusters for parallel jobs MUST be homogeneous, with SMPSize published correctly.

Mixed mpi/mutithread programs might require thread safe MPI implementations. Thead safety can be easily verified:

MPI_Init_thread(&argc, &argv, 3, &prov); 
printf("MPI_Init_thread provided:%d\n", prov);

The third parameter (number 3) means a request of full thread safety support ( MPI_THREAD_MULTIPLE ). If returned value for prov is 0 thread support is not provided (MPI_THREAD_SINGLE).

CPU time limit

Most of the sites have a WallClock time limit similar to CPU time limit.

lcg-info --list-ce --attrs MaxWCTime,MaxCPUTime -vo theophys

This setting is typical for sequential jobs, but parallel jobs rapidly face the CPU time limit.

Recommended solution: Do not set the CPU-time limit or set the CPU time limit very high.

Accounting and Monitoring

Information about MPI jobs, provided by the LRMS logs, should be used to feed a specific accounting and monitoring system for parallel jobs.

Scheduling

objectives
  • Minimize jobs starvation
  • Maximize resources exploitation
Possible scenario

MPI sites with at least 2 queues sharing the same pool of WNs:

  • high priority parallel queue
    • accessible only with special Role (Role=parallel ?)
  • Low priority sequential queue
    • preemptable (renice or requeue ?)
    • short WallClockTime (less than 6 hours?)
    • accessible only with special Role (Role=short ?).
    • cores reservation for parallel jobs and backfill of sequential jobs?

CSN4CLUSTER

Cream CE per parallel Jobs: gridce3.pi.infn.it

Sottogruppi: “/theophys/Cluster_MPI_Pisa”

non utilizate le home condivise ma riservare un pezzo di spazio disco condiviso per l'esecuzione dei calcoli paralleli

* openmpi1.3.2 e mpich2 1.1.1.

Storage:

lfc-mkdir /grid/theophys/IS/
lfc-mldir /grid/theophys/IS/CT11
lfc-mldir /grid/theophys/IS/BO11
# upload the input file on a Theophys Storage Element 
lcg-cr -v --vo theophys  -l lfn:/grid/theophys/IS/MI11/input.dat file://$(pwd)/input.dat
# upload the input file on a specific SE
lcg-cr -v --vo theophys -d grid-se2.pr.infn.it -l lfn:/grid/theopyhs/IS/MI11/input.dat file://$(pwd)/input.dat

Test granularity

1/10/2010

cd /opt/glite/bin
cp pbs_submit.sh pbs_submit.sh.save
wget http://www.fis.unipr.it/grid/MPI_granularity_scripts/pbs_local_submit_attributes.sh
wget http://www.fis.unipr.it/grid/MPI_granularity_scripts/pbs_submit.sh
chmod 755 *.sh

Revision history

  • 20100225 - R. DePietri, F. DiRenzo - User's required libraries
  • 20100210 - C. Aiftimiei, R.Alfieri, M.Bencivenni, T.Ferrari - First Version
/var/www/html/dokuwiki/data/pages/grid/mpi-theophys.txt · Ultima modifica: Y/m/d H:i da