Difference between revisions of "Atlas:Analysis Challenge ST"

Un article de lcgwiki.
Jump to: navigation, search
 
 
(22 intermediate revisions by the same user not shown)
Ligne 1: Ligne 1:
== ATLAS Analysis Challenge : Site Stress Test ==
+
<b> Site Stress Test </b>
 
+
--[[User:Chollet|Chollet]] 12:49, 29 janvier 2009 (CET)
 
==== Procedure ====
 
==== Procedure ====
 
* Replication of target datasets accross the cloud
 
* Replication of target datasets accross the cloud
Ligne 9: Ligne 9:
 
==== Test conditions ====
 
==== Test conditions ====
  
The testing framework is ganga-based. It is currently using LCG backend but it will soon be possible to use PANDA backend as well. Both POSIX I/O and "copy mode" may be used allowing performances comparaison of the 2 modes. <br>
+
* The testing framework is ganga-based. It is currently using LCG backend but it will soon be possible to use PANDA backend as well. Metrics are collected and displayed at http://gangarobot.cern.ch/st/
It uses regular AOD analysis in 14.2.20 with mc08*AOD*e*s*r5 DQ2 inputs<br>
+
* Both POSIX I/O and "copy mode" may be used allowing performances comparaison of the 2 modes. <br>
Input datasets are read from ATLASMCDISK and outputs are stored on ATLASUSERDISK (no special requirements there). Input data access is the main issue. No problem on data output <br>
+
* ATLAS software release 14.2.20  
Required CPUtime :~ 1 day (typical job duration 5 h)<br>
+
* Input DS Patterns used (first one is the preferred one for muon analysis):
LAN saturation observed at ~ 3 Hz in case of 1 Gb network connection between WN and SE. <br>
+
    mc08.*Wmunu*.recon.AOD.e*_s*_r5*tid*
 
+
    mc08.*Zprime_mumu*.recon.AOD.e*_s*_r5*tid*
* GangaRobot
+
    mc08.*Zmumu*.recon.AOD.e*_s*_r5*tid*
DA challenge runs similar to existing analysis functional tests : http://gangarobot.cern.ch/<br>
+
    mc08.*T1_McAtNlo*.recon.AOD.e*_s*_r5*tid*
See [http://indico.cern.ch/getFile.py/access?contribId=63&sessionId=11&resId=0&materialId=slides&confId=22137 J.Elmsheuser's presentation (Nov. 5 08)]<br>
+
    mc08.*H*zz4l*.recon.AOD.e*_s*_r5*tid*
<b> Site should pass the GangaRobot test successfully </b> , especially :
+
    mc08.*.recon.AOD.e*_s*_r5*tid* (this pattern includes all the previous ones)
* http://gangarobot.cern.ch/20081119_02/index.html
+
* Input datasets are read from ATLASMCDISK and outputs are stored on ATLASUSERDISK (no special requirements there). Input data access is the main issue. No problem on data output <br>
* http://gangarobot.cern.ch/20081119_03/index.html
+
* Required CPUtime : GlueCEPolicyMaxCPUTime >= 1440 (1 day , typical duration : 5 hours)
 +
* Jobs run under DN : /O=GermanGrid/OU=LMU/CN=Johannes_Elmsheuser
 +
* LAN saturation observed in case of 1 Gb network connection between WN and SE. <br>
 +
* It is possible for sites to limit the number of jobs sent at a time.
 +
* Test duration : 48 hours
  
<b>Participation required at cloud and site level. Any site in the Tiers_of_ATLAS list can participate.</b>
+
=== File access modes used by Ganga Jobs ===
  
It is possible for sites to limit the number of jobs sent at a time.
+
Two access modes may be used during ST tests.
DA team is ready to take into account site constraints.
+
* '''(j.inputdata.type='DQ2_LOCAL')''' : Currently if users are submitting Ganga jobs with the LCG backend, the posix I/O access is the default method to access files
DA team is open to any metrics
+
* '''(j.inputdata.type='FILE_STAGER')''' : One of the alternative access mode used by ST tests is the FileStager mode, which copies the input files in a background thread of the athena event loop usinglcg-cp from the local SE to the worker node tmp area. This mode still needs a bit of improvements to gain full stability, but as demonstrated good performance during ST at some (but not all) sites.
  
 
=== Target and metrics ===
 
=== Target and metrics ===
* Nb of events : Few hundred up to 1000 jobs/site  
+
* Nb of jobs : Few hundred up to 1000 jobs/site  
 
* Rate (evt/s) : up to 15 Hz  
 
* Rate (evt/s) : up to 15 Hz  
* Efficiency (success/failure rate) : 80 %
+
* Success rate (success/failure rate) > 80 %
 
* CPU utilization :  CPUtime / Walltime > 50 %
 
* CPU utilization :  CPUtime / Walltime > 50 %
  
=== Results ===
+
=== Results and Monitoring ===
See  
+
* See http://gangarobot.cern.ch/st/
* ATLAS Twiki page : https://twiki.cern.ch/twiki/bin/view/Main/GangaSiteTests
 
* Results of analysis challenge performed on [http://indico.cern.ch/getFile.py/access?contribId=128&sessionId=8&resId=0&materialId=slides&confId=22137 IT Cloud]
 
* Results of [http://indico.cern.ch/getFile.py/access?contribId=129&sessionId=8&resId=0&materialId=slides&confId=22137 DE Cloud]
 

Latest revision as of 13:49, 29 janvier 2009

Site Stress Test --Chollet 12:49, 29 janvier 2009 (CET)

Procedure

  • Replication of target datasets accross the cloud
  • Preparation of job
  • Generation n jobs per site (Each job processes 1 dataset)
  • Bulk submission to WMS (1 per site)

Test conditions

  • The testing framework is ganga-based. It is currently using LCG backend but it will soon be possible to use PANDA backend as well. Metrics are collected and displayed at http://gangarobot.cern.ch/st/
  • Both POSIX I/O and "copy mode" may be used allowing performances comparaison of the 2 modes.
  • ATLAS software release 14.2.20
  • Input DS Patterns used (first one is the preferred one for muon analysis):
   mc08.*Wmunu*.recon.AOD.e*_s*_r5*tid*
   mc08.*Zprime_mumu*.recon.AOD.e*_s*_r5*tid*
   mc08.*Zmumu*.recon.AOD.e*_s*_r5*tid*
   mc08.*T1_McAtNlo*.recon.AOD.e*_s*_r5*tid*
   mc08.*H*zz4l*.recon.AOD.e*_s*_r5*tid*
   mc08.*.recon.AOD.e*_s*_r5*tid* (this pattern includes all the previous ones)
  • Input datasets are read from ATLASMCDISK and outputs are stored on ATLASUSERDISK (no special requirements there). Input data access is the main issue. No problem on data output
  • Required CPUtime : GlueCEPolicyMaxCPUTime >= 1440 (1 day , typical duration : 5 hours)
  • Jobs run under DN : /O=GermanGrid/OU=LMU/CN=Johannes_Elmsheuser
  • LAN saturation observed in case of 1 Gb network connection between WN and SE.
  • It is possible for sites to limit the number of jobs sent at a time.
  • Test duration : 48 hours

File access modes used by Ganga Jobs

Two access modes may be used during ST tests.

  • (j.inputdata.type='DQ2_LOCAL') : Currently if users are submitting Ganga jobs with the LCG backend, the posix I/O access is the default method to access files
  • (j.inputdata.type='FILE_STAGER') : One of the alternative access mode used by ST tests is the FileStager mode, which copies the input files in a background thread of the athena event loop usinglcg-cp from the local SE to the worker node tmp area. This mode still needs a bit of improvements to gain full stability, but as demonstrated good performance during ST at some (but not all) sites.

Target and metrics

  • Nb of jobs : Few hundred up to 1000 jobs/site
  • Rate (evt/s) : up to 15 Hz
  • Success rate (success/failure rate) > 80 %
  • CPU utilization : CPUtime / Walltime > 50 %

Results and Monitoring