Atlas:Analysis HC beyond STEP09
--Chollet 15:56, 19 octobre 2009 (CEST)
Distributed Analysis Stress Tests - HammerCloud beyond STEP09
Sommaire
Lessons learnt from STEP09
- Sites may identify reasonable amount of analysis they can assume and set hard limits on number of analysis running jobs
- Balancing data across many disk servers is essential.
- Very high i/o required by analysis (5 MB/s per job). Sites should review LAN architecture to avoid bottlenecks.
Results
- ATLAS HC Tests results : http://gangarobot.cern.ch/hc/ (New web interface!)
- ATLAS STEP09 summary : http://gangarobot.cern.ch/st/step09summary.html
ATLAS Info & Contacts
- Information via mailing list ATLAS-LCG-OP-L@in2p3.fr
- LPC : Nabil Ghodbane - Nabil.Ghodbane@cern.ch
- LAL : Nicolas Makovec
- LAPP : Stéphane Jézéquel
- CPPM : Emmanuel Le Guirriec
- LPSC : Sabine Crepe
- LPNHE : Tristan Beau
- CC-T2 : Catherine, Ghita
- IRFU : Nathalie Besson
HC Tests
Objectives
- Improve Cloud readiness by following site&ATLAS problems week by week (SL5 migration, site upgrades)
- Identify best data access method per site by comparing the event rate and CPU/Walltime
https://twiki.cern.ch/twiki/bin/view/Atla/HammerCloudDataAccess#FR_cloud
- Exercise Analysis with Conditions DB access (see where squid caching is needed) and Tag analysis
Data Access methods
Multiple data access methods are exercised
- via Panda : A copy-to-WN access mode using rfcp is used (xrootd in ANALY-LYON)
- via gLite WMS : 2 data access modes available
- DQ2_LOCAL mode is a direct access mode using rfio or dcap
- FILE_STAGER mode : data staged in by a dedicated thread running in // with Athena
Week 40
- Muon Analysis (Release 15.3.1)
- Input DS (STEP09) : mc08.*merge.AOD.e*_s*_r6*tid*
- 3 HC tests of 24 hrs each :
- Test 649 via Panda (mode copy-to-WN using ddcp/rfcp - xrootd in ANALY-LYON)
- Test 649 via Panda (mode copy-to-WN using ddcp/rfcp - xrootd in ANALY-LYON)
Bad efficiency - all sites affected all sites Failed jobs with error : exit code 1137 Put error: Error in copying the file from job workdir to localSE due to LFC ACL problem : write permissions in /grid/atlas/users/pathena for pilot jobs /atlas/Role=pilot and /atlas/fr/Role=pilot (newly activated)
Recent talks
- ATLAS : from STEP09 towards first beams Graeme Stewart's talk@Journées Grille France (16 October 2009)
- Summary of HammerCloud Tests since STEP09 Dan van der Ster's talk@ATLAS Jamboree T1/T2/T3 (13 October 2009)
- HammerCloud Plans Johannes Elmsheuser's talk@ATLAS S&C Week (2 September 2009)