===== MPI/theophys Project ==== ==== WorkerNodes ==== == Software environment == * SL5.x x86_64 * openMPI >= 1.3 (1.4 would be better) * MPICH2 * Gnu C, C++, Fortran (gfortran, g77 , g90??) * Support for commercial compilers? * Scientific libraries: * openMP: multithread library * HDF5 : Data storing and managing library. * Blas : Basic Linear Algebra Subprograms * lapack: Linear Algebra PACKage * GSL : Gnu Scientific Library * GMP: Gnu Multiprecision Library * GLPK: Gnu Linear Programming Kit. * Fftw3: Fast Fourier Transform Library * Octave : high-level language for numerical computations ==Installation example== yum install -y yum-conf-epel.noarch yum install -y octave hdf5-devel glpk fftw3 yum install -y libgomp blas-devel gsl-devel gmp-devel == theophys TAG== The following is a possible TAG to be published by theophys compliant sites: GlueHostApplicationSoftwareRunTimeEnvironment: VO-theophys-gcc41 ?? ==== Cluster ==== == Published TAGs for MPI == Mpi-start is the way to start MPI jobs: MPI-START Al least openMPI should be installed: MPI_OPENMPI MPI_OPENMPI_VERSION="x.y.z" Shared home is recommended, but file distribution is supported by MPI-start: MPI_SHARED_HOME | MPI_NO_SHARED_HOME Remote Start-up of MPI job can be achieved via password-less SSH: MPI_SSH_HOST_BASED_AUTH Infiniband is recommended, but Gbit (or 10Gb) Ethernet can be used: MPI-Infiniband | MPI-Ethernet ==Open Issues== * Is it possible to publish the actual number of free CPUs per queue? * How is CpuNumber used in the match-making process? At the moment CpuNumber is not used at all for match making. Temporary solution in the JDL: CPUNumber=n other.GlueCEInfoTotalCPUs >= CPUNumber ==== JDL ==== ==Typical parallel jdl== JobType = "Normal" ; CpuNumber = 8 ; ==multithread support == SMPGranularity = 8; WholeNodes = True; Multithread support is desirable and it should be integrated in the middleware as soon as possible. == Open Issues == * Is it possible to integrate Granularity/WholeNodes directly in InfnGrid? CREAM and BLAH: see https://twiki.cern.ch/twiki/bin/view/EGEE/ParameterPassing ?? WMS: included in WMS 3.3 ==== Parallel and sequential jobs ==== VOMS Roles can be used to limit the access to Parallel queues. == Voms Role = "Parallel" == The [[https://voms.cnaf.infn.it:8443/voms/infngrid/SearchRoles.do | Role]] is assigned by the VO manager and released by [[https://voms.cnaf.infn.it:8443/voms/theophys/Siblings.do |VOMS]] only on explicit request. ==Setup example== site-info.def: PARALLEL_GROUP_ENABLE="/infngrid/ROLE=parallel" /opt/glite/yaim/defaults/ig-site.pre: FQANVOVIEWS=yes groups.conf: "/infngrid/ROLE=parallel":::: voms-proxy-init -voms infngrid:/infngrid/Role=parallel voms-proxy-info -all >.... >attribute : /infngrid/Role=parallel/Capability=NULL >attribute : /infngrid/Role=NULL/Capability=NULL >... ==== MPI multi-thread jobs ==== MPI and multi-thread programs can be combined to exploit the upcoming multicore architectures. The hybrid mutlithread/MPI programming leads to a request of N CPUs with a smaller number of MPI processes (N/thread_num). Actually this programming model is not supported in EGEE. Possible solution: modify the value type of WholeNodes from boolean to integer. Example: SMPGranularity = 8; WholeNodes = 4; This syntax would lead to qsub -l nodes=4:ppn=GlueHostArchitectureSMPSize where ppn is a number >=8. WholeNodes value should be passed to mpi-start as the number of MPI processes. Mpi-start should be modified accordingly. Mixed mpi/mutithread programs require thread safe MPI implementations. Thead safety can be easily verified: MPI_Init_thread(&argc, &argv, 3, &prov); printf("MPI_Init_thread provided:%d\n", prov); The third parameter (number 3) means a request of full thread safety support ( MPI_THREAD_MULTIPLE ). If returned value for prov is 0 thread support is not provided (MPI_THREAD_SINGLE). ==== Scheduling ==== ==objectives== * Minimize jobs starvation * Maximize resources exploitation ==Possible scenario== MPI sites with at least 2 queues sharing the same pool of WNs: * **high priority parallel queue** * accessible only with special Role (Role=parallel ?) * **Low priority sequential queue** * preemptable (renice or requeue ?) * short WallClockTime (less than 6 hours?) * accessible only with special Role (Role=short ?). ==== Revision history ==== * 20100225 - R. DePietri, F. DiRenzo - User's required libraries * 20100210 - C. Aiftimiei, R.Alfieri, M.Bencivenni, T.Ferrari - First Version