Strumenti Utente

Strumenti Sito


grid:characteristics

CSN4cluster Characteristics

CSN4Cluster is the CSN4 centralized service for serial and parallel computation.

The cluster is installed, configured and maintained by the INFN-Pisa scientific computing center (led by A. Ciampa, + S. Arezzini, D. Fabiani, E. Mazzoni).

Technical documentation

Cluster NEWS

ACCESS

In order to use the cluster a INFN Certificate is needed as well as the subscription to the Theophys Virtual Organization ( detailed info )

User Interface

The cluster access requires tool and knowledge well known in the INFN scientific community, by means of a User Interface (UI).

User Interfaces are availble in most of the INF sites. See: http://www.italiangrid.org/grid_operations/users/getting_started/UI

A specific UI will be installed in Pisa, including, besides the classic UI functions, a Web portal and a local access (via ssh and bsub) obtained with a special PAM module (INFN-AAI) which enables the possibility to access the UI using credential provided by the own user's site.

SOFTWARE

PARALLEL AND SERIAL QUEUES

Worker Nodes (WN) are splitted into 2 partitions

Serial Partition (CEs: gridce0.pi.infn.it, gridce1.pi.infn.it and gridce2.pi.infn.it)

WNs are shared among other VOs, but Theophys has higher priority, stated ba the Fairshare policy. The home directory is located on the node local disk.

  • theophys queue
    • Max RUNTIME (WallClock Time Limit) = 72 h (3 days)
Paralle Partition (CE: gridce3.pi.infn.it)

The home is shared among WNs via GPFS (IP over IB). Actually only Theophys can submit to this queue.

  • theompi queue
    • Only Parallel jobs are accepted, using the role=parallel provided by VOMS.
    • Reservation time = 8h If parallel job requires a nodes number higher than the current availability, LSF will lock the available processors for a limited time (Reservation Time). During this time slots becoming free are collected until the requested number is reached.
    • RUNTIME = 72h
  • theoshort queue
    • This queue accepts only short serial jobs (max 4 hours) Role=parallel will not be specified Accetta solo job seriali di breve durata.
    • RUNTIME = 4h This ensures that jobs will end on time for possible waiting parallel jobs. The Backfill mechanism enables for short serial jobs the usage of reserved processors if the reuested run time is in the reseration time.

STORAGE

10 TB splitted in 2 partitions: SRM (9 TB) + shared-home for parallel partition (1 TB)

SRM

Storage Resource Manager (SRM) is a Grid service interacting with the local storage systems and offering a Grid interface to the Grid infrastructure.

Files and directories are registered in a file catalog with a global namespace and are physically mapped on Storage Elements (SE) enabled for the specific VO.

or each IS a directory has been created in the File Catalog with nake like /grid/theophys/IS_<nome_Iniziativa_Specifica> (e.g. /grid/theophys/IS_AD31)

The SRM server in Pisa available for Theophys is gridsrm.pi.infn.it

For both partitions (serial and parallel) the user can save data directly (via Posix) on the SRM storage (to be verified).

shared home

Is the working directory for the parallel partition.

SERIAL JOB SUBMISSION

Serial jobs can be:

  • Longs serial jobs (72h): serial partition
  • Short serial jobs (4h): free slots of the parallel partition

Details: /strutture/pi/datacenter/cluster_gruppo_iv/csn4cluster/job_sequenziali

PARALLEL JOB SUBMISSION

Parallel jobs can be

  • multi-thread (needing all the cores of a sigle node)
  • memory-bound (needing the whole memory of a single node)
  • MPI ( mpi pure or hybrid)

Details: /strutture/pi/datacenter/cluster_gruppo_iv/csn4cluster/job_paralleli

MONITORING

CSN4cluster usage can be monitored in real time using the LSF web interface : http://farmsmon.pi.infn.it/lsfmon/

Real time state of the queues (thempi, theoshort, theophys): http://www.fis.unipr.it/grid/tutorial/qstat.php

HLRMON data concerning INFN-PISA: per Group/Role per User

SUPPORT

To request general support, concering information of general interest, please contact the users mailing-list theophys<at>lists.infn.it, which includes all the theophys users.

For support concerning scecifically the cluster hardware please contact the Pisa Pomputing Service: grid-prod<at>pi.infn.it

For policy or organizational problems please contact the CSN4Cluster Committee.

TRAINING

First CSN4cluster tutorial: Pisa, 7 and 8 April 2011

/var/www/html/dokuwiki/data/pages/grid/characteristics.txt · Ultima modifica: Y/m/d H:i da