Atlas:Analysis HC beyond STEP09

Un article de lcgwiki.
Revision as of 14:52, 23 octobre 2009 by Chollet (talk | contribs) (Week 40)
Jump to: navigation, search

--Chollet 15:56, 19 octobre 2009 (CEST)

Distributed Analysis Stress Tests - HammerCloud beyond STEP09 ATLAS-HC-small.jpg

Lessons learnt from STEP09

  • Sites may identify reasonable amount of analysis they can assume and set hard limits on number of analysis running jobs
  • Balancing data across many disk servers is essential.
  • Very high i/o required by analysis (5 MB/s per job). Sites should review LAN architecture to avoid bottlenecks.

Results

ATLAS Info & Contacts

  • Information via mailing list ATLAS-LCG-OP-L@in2p3.fr
  • LPC : Nabil Ghodbane - Nabil.Ghodbane@cern.ch
  • LAL : Nicolas Makovec
  • LAPP : Stéphane Jézéquel
  • CPPM : Emmanuel Le Guirriec
  • LPSC : Sabine Crepe
  • LPNHE : Tristan Beau
  • CC-T2 : Catherine, Ghita
  • IRFU : Nathalie Besson

HC Tests

Objectives

  • Improve Cloud readiness by following site&ATLAS problems week by week (SL5 migration, site upgrades)
  • Identify best data access method per site by comparing the event rate and CPU/Walltime

https://twiki.cern.ch/twiki/bin/view/Atla/HammerCloudDataAccess#FR_cloud

  • Exercise Analysis with Conditions DB access (see where squid caching is needed) and Tag analysis

Data Access methods

Multiple data access methods are exercised

  • via Panda : A copy-to-WN access mode using rfcp is used (xrootd in ANALY-LYON)
  • via gLite WMS : 2 data access modes available
    • DQ2_LOCAL mode is a direct access mode using rfio or dcap
    • FILE_STAGER mode : data staged in by a dedicated thread running in // with Athena

Week 40

  • Muon Analysis (Release 15.3.1)
  • Input DS (STEP09) : mc08.*merge.AOD.e*_s*_r6*tid*
  • 3 HC tests of 24 hrs each :
    • 29/09/09 Test 649 via Panda (mode copy-to-WN using ddcp/rfcp - xrootd in ANALY-LYON)
  Bad efficiency - all sites affected all sites 
  Failed jobs with error : exit code 1137
  Put error: Error in copying the file from job workdir to localSE
  due to LFC ACL problem : write permissions in /grid/atlas/users/pathena
  for pilot jobs /atlas/Role=pilot and /atlas/fr/Role=pilot (newly activated) 
    • 30/09/09 Test 652/656 via WMS (DQ2_LOCAL mode or direct access dcap/rfio)
    • 30/09/09 Test 653/657 via WMS (FILE_STAGER mode)

Recent talks