Strumenti Utente

Strumenti Sito


roberto.depietri:user:speed_testing

Speed Testing

Here I report the note on the attivity (and in this directory tree) all the result I will optain on characterizing the performace of the Einstein Toolkit on varoiu machine I do have access. This log book will cover all my activity starting from December the 7th 2012. This section contains various parts that will help to understand how well Catus behave on different platform. Main purpose of this testing is to find out how to run simulations on "Fermi".

The main directory where I store the result of the Cactus Speed Test is "/work/staff/roberto.depietri/OrstedSpeedTest"

General Consideration

I decided to consider the November 2012 verdion announced as follow: We are pleased to announce the sixth release (code name "Ørsted") of the Einstein Toolkit, an open, community developed software infrastructure for relativistic astrophysics.

The main problems on previous test I did where strang scaling properties of Carpet going to 256 or more processor and a lack of a proper log of the activity I did. Thanks to Frank Loeffler I realized that the main scaling problem I observed were due to CARPET IOASCII for 1d output. I pointed to me that all the processor write in an order sequence to the 1d files and indeed the writing time scales linearly with the number of MPI processes involved. Lesson lerned: do no output in testing speed and scalig. Do separate IO testing and do not mix up the to type of speed testing.

The good lesson I learned in previous test is the need to have standarzide configuration to compare and use as reference. Alway do strong a week scaling check.

UNIGRID tests

First check UNIGRID: border at 60,60,60 doing 32 integration steps

  PUGH:   PUGHit32.rpar   generate par files like PUGHdx1.000it32.par      
  CARPET: CARPETit32.rpar generate par files like CARPETdx1.000it32.par
  #################################################################################
  ### dx=[1.5 ....... 0.15]; nx=(60./dx *2 +1 +4);vol=nx.^3;[dx ;nx;vol/vol(1.5PUGH)]
  ##################################################################################
  ##  2.00  1.50  1.00  0.75  0.625  0.60  0.50  0.40  0.30  0.25  0.20  0.15  0.125
  ##    65    85   125   165    197   205   245   305   405   485   605   805    965
  ##  0.44  1.00  3.18  7.31  12.45  14.1  23.9  46.2   108   185   361   849   1463
  ##
  ##  dx=1.0 Carpet requires 4312.518 MB
  ##  dx=1.5 Carpet requires 1356.006 MB
  ##  dx=2.0 Carpet requires  606.390 MB
  #################################################################################

CARPET tests

Then check 3 refinement levels. Borders at 120 and subgrid at 60 and 30. Also in this case we will do 32 integration steps on the finest grid. Resolution dx will refer to the finer grid

  CARPET: CARPET_RL3_it32.rpar generate par files like CARPET_RL3_dx1.000it32.par
  #################################################################################
  ### dx=[1.5 ....... 0.15]; nx=(120./(4*dx) *2 +1 +4);vol=3*nx.^3;[dx ;nx;vol/vol(1.5PUGH)]
  ##################################################################################
  ##  2.00  1.50  1.00  0.75  0.625  0.60  0.50  0.40  0.30  0.25  0.20  0.15  0.125
  ##   35     45    65    85    101   105   125   205   245   305   405   485    605
  ##  0.21  0.45  1.34     3   5.03  5.66  9.54  42.1  71.8   139   324   557   1081
  ##
  ##  dx=0.75 Carpet requires 6468.078 MB
  ##  dx=1.0 Carpet requires 3318.366 MB
  ##  dx=1.5 Carpet requires 1471.679 MB
  ##  dx=2.0 Carpet requires 1068.414 MB
  ##     Total time for simulation  (np1..t1)= 701 sec (11 minuti)
  ## Su Blue Gene Q se perfect scaling will require (1024 cores)
  ##  2.00  1.50  1.00  0.75  0.625  0.60  0.50  0.40  0.30  0.25  0.20  0.15  0.125
  ##  0.01  0.03  0.09  0.23  0.37         0.73  1.42  3.38  5.84  11.4  27.1  46.7
  #################################################################################

General problem with the testing

First test had shown that the use of

ActiveThorns = "TimerReport"
TimerReport::output_all_timers_readable ="yes"
TimerReport::out_every=32
TimerReport::out_filename = "TimerReport"
TimerReport::output_schedule_timers = "no"
TimerReport::output_all_timers = "no"

deeply effect tests results. For example "CARPET_RL3_dx0.400it32.par" have the following result for timer "Total time for simulation" .

     Blue Gene Size    Np  Nt  Total time for simulation
With TimerReport  64   64  16  249 s
                  64  128   8  251 s
                  64  256   4  272 s
                  64  512   2  316 s
                  64 1024   1  414 s
                 128 2048   1  564 s
Without           64 1024   1  273 s
                 128 2048   1  278 s
                 256 4096   1  362 s

All the speed tests will be performed without the activation of "TimerReport".

Second stage of Testing

The second stage of testing involved just the output of various reduction of "rho". No other outputs.

Parfile: CARPET_RL3_dx(…)it32.par
dx BG Size # of cores OMP size simulation CCT_EVOLV WALL Time
0.150 256 4096 1 867 555 2995
0.200 256 4096 1 581 280 2707
0.200(*) 128 2048 1 696 510 1358
0.250 256 4096 1 467 186 2593
0.250 128 2048 1 472 306 1130
0.250 64 1024 1 632 518 827
0.300 256 4096 1 409 131 2536
0.300 128 2048 1 270 220 1313
0.300 64 1024 1 437 340 732
0.400 256 4096 1 362 90 2499
0.400 128 2048 1 279 129 1222
0.400 64 1024 1 273 194 609
0.500 256 4096 1 371 72 2497
0.500 128 2048 1 228 98 885
0.500 64 1024 1 204 132 398
OpenMP vs pure MPI
0.250 64 1024 1 632 518 827
0.250 64 1024 2 594 508 661
0.250 64 1024 4 583 507 614
0.250 64 1024 8 610 537 630
0.250 64 1024 16 677 597 694
0.500 64 1024 1 204 132 398
0.500 64 1024 2 185 134 254
0.500 64 1024 4 172 127 202
0.500 64 1024 8 173 131 194
0.500 64 1024 16 184 143 203

(*) This run was also performed doing as much as four time the number of time integration of it=128 and the corresponding CCTK_EVOL changed from 510 to 2100 and simulation from 696 to 2524.

Evaluation of the time to checkpoints

roberto.depietri/user/speed_testing.txt · Ultima modifica: 15/01/2013 17:11 da roberto.depietri