The project aims at the setting up of a operative Grid infrastructure for parallel jobs for the theoretical physics community of INFN. https://web.infn.it/CSN4/
The researchers are registered into the Theophys Virtual Organization.
The research project are called Iniziative Specifiche.
A common configuration environment will be defined and installed on the parallel resources supporting Theophys.
The purpose of this document is to collect open issues and possible solutions.
yum install -y yum-conf-epel.noarch yum install -y hdf5-devel glpk fftw3 yum install -y libgomp blas-devel gsl-devel gmp-devel
The following is a possible TAG to be published by theophys compliant sites:
GlueHostApplicationSoftwareRunTimeEnvironment: VO-theophys-gcc41 VO-theompi-gcc41 ??
Mpi-start is the way to start MPI jobs:
Al least openMPI should be installed:
Shared home is recommended, but file distribution is supported by MPI-start:
MPI_SHARED_HOME | MPI_NO_SHARED_HOME
Remote Start-up of MPI job can be achieved via password-less SSH:
Infiniband is recommended, but Gbit (or 10Gb) Ethernet can be used:
MPI-Infiniband | MPI-Ethernet
At the moment CpuNumber is not used at all for match making. Temporary solution in the JDL: CPUNumber=n other.GlueCEInfoTotalCPUs >= CPUNumber
JobType = "Normal" ; CpuNumber = 8 ;
SMPGranularity = 8; WholeNodes = True;
Multithread support is desirable and it should be integrated in the middleware as soon as possible.
CREAM and BLAH: see https://twiki.cern.ch/twiki/bin/view/EGEE/ParameterPassing ?? WMS: included in WMS 3.3
VOMS Roles can be used to limit the access to Parallel queues.
site-info.def: PARALLEL_GROUP_ENABLE="/infngrid/ROLE=parallel" /opt/glite/yaim/defaults/ig-site.pre: FQANVOVIEWS=yes groups.conf: "/infngrid/ROLE=parallel":::: voms-proxy-init -voms infngrid:/infngrid/Role=parallel voms-proxy-info -all >.... >attribute : /infngrid/Role=parallel/Capability=NULL >attribute : /infngrid/Role=NULL/Capability=NULL >...
MPI and multi-thread programs can be combined to exploit the upcoming multicore architectures. The hybrid mutlithread/MPI programming leads to a request of N CPUs with a smaller number of MPI processes (N/thread_num). Actually this programming model is not supported in EGEE. Possible solution: modify the value type of WholeNodes from boolean to integer. Example:
SMPGranularity = 8; WholeNodes = 4;
This syntax would lead to
qsub -l nodes=4:ppn=GlueHostArchitectureSMPSize
where ppn is a number >=8. WholeNodes value should be passed to mpi-start as the number of MPI processes. Mpi-start should be modified accordingly. Since these new TAGs rely on GlueHostArchitectureSMPSize attribute, subclusters for parallel jobs MUST be homogeneous, with SMPSize published correctly.
Mixed mpi/mutithread programs might require thread safe MPI implementations. Thead safety can be easily verified:
MPI_Init_thread(&argc, &argv, 3, &prov); printf("MPI_Init_thread provided:%d\n", prov);
The third parameter (number 3) means a request of full thread safety support ( MPI_THREAD_MULTIPLE ). If returned value for prov is 0 thread support is not provided (MPI_THREAD_SINGLE).
Most of the sites have a WallClock time limit similar to CPU time limit.
lcg-info --list-ce --attrs MaxWCTime,MaxCPUTime -vo theophys
This setting is typical for sequential jobs, but parallel jobs rapidly face the CPU time limit.
Recommended solution: Do not set the CPU-time limit or set the CPU time limit very high.
Information about MPI jobs, provided by the LRMS logs, should be used to feed a specific accounting and monitoring system for parallel jobs.
MPI sites with at least 2 queues sharing the same pool of WNs:
Cream CE per parallel Jobs: gridce3.pi.infn.it
non utilizate le home condivise ma riservare un pezzo di spazio disco condiviso per l'esecuzione dei calcoli paralleli
* openmpi1.3.2 e mpich2 1.1.1.
lfc-mkdir /grid/theophys/IS/ lfc-mldir /grid/theophys/IS/CT11 lfc-mldir /grid/theophys/IS/BO11
# upload the input file on a Theophys Storage Element lcg-cr -v --vo theophys -l lfn:/grid/theophys/IS/MI11/input.dat file://$(pwd)/input.dat # upload the input file on a specific SE lcg-cr -v --vo theophys -d grid-se2.pr.infn.it -l lfn:/grid/theopyhs/IS/MI11/input.dat file://$(pwd)/input.dat
cd /opt/glite/bin cp pbs_submit.sh pbs_submit.sh.save wget http://www.fis.unipr.it/grid/MPI_granularity_scripts/pbs_local_submit_attributes.sh wget http://www.fis.unipr.it/grid/MPI_granularity_scripts/pbs_submit.sh chmod 755 *.sh