Difference between revisions of "Atlas:Analysis HC beyond STEP09"

Un article de lcgwiki.
Jump to: navigation, search
(15/10/09 ''<span style="color:#00FF00;">Test 682'')
(15/10/09 ''<span style="color:#00FF00;">Test 682'')
Ligne 84: Ligne 84:
 
* via Panda (mode copy-to-WN using ddcp/rfcp - xrootd in ANALY-LYON)
 
* via Panda (mode copy-to-WN using ddcp/rfcp - xrootd in ANALY-LYON)
 
* http://gangarobot.cern.ch/hc/682/test/
 
* http://gangarobot.cern.ch/hc/682/test/
  '''Pb GRIF-IRFU''' with release 15.3.1 (has run fine last week - 15.3.1 needs to be patched)
+
* Sites problems or results to be followed-up :
 +
 
 +
** GRIF-IRFU : failures due to release 15.3.1 installation (has run fine last week but needed to be patched)
 
   gcc version 4.1.2 used instead of gcc 3.4...
 
   gcc version 4.1.2 used instead of gcc 3.4...
 
   See [https://gus.fzk.de/ws/ticket_info.php?ticket=52483 GGUS ticket 52483]
 
   See [https://gus.fzk.de/ws/ticket_info.php?ticket=52483 GGUS ticket 52483]
  ''' LYON-T2 results''' both queues ANALY_LYON (xrootd) and ANALY_LYON-DCACHE exercised  
+
** LYON (T2) :  both queues ANALY_LYON (xrootd) and ANALY_LYON_DCACHE exercised at the same time
  at the same time
+
  limitation due to BQS resource (u_xrootd_lhc) 
 +
  ANALY_LYON_DCACHE : limited number of jobs but good performances (effect of dcache upgrade ?)
 +
  ANALY_LYON : many failures - problems followed by J.Y Nief (root version used by ATLAS ?
  
 
== Recent talks ==
 
== Recent talks ==

Version du 18:11, 26 octobre 2009

--Chollet 15:56, 19 octobre 2009 (CEST)

Distributed Analysis Stress Tests - HammerCloud beyond STEP09

Lessons learnt from STEP09

  • Sites may identify reasonable amount of analysis they can assume and set hard limits on number of analysis running jobs
  • Balancing data across many disk servers is essential.
  • Very high i/o required by analysis (5 MB/s per job). Sites should review LAN architecture to avoid bottlenecks.

Results

ATLAS Info & Contacts

  • Information via mailing list ATLAS-LCG-OP-L@in2p3.fr
  • LPC : Nabil Ghodbane - Nabil.Ghodbane@cern.ch
  • LAL : Nicolas Makovec
  • LAPP : Stéphane Jézéquel
  • CPPM : Emmanuel Le Guirriec
  • LPSC : Sabine Crepe
  • LPNHE : Tristan Beau
  • CC-T2 : Catherine, Ghita
  • IRFU : Nathalie Besson

HC Tests

ATLAS-HC-small.jpg

Objectives

  • Improve Cloud readiness by following site&ATLAS problems week by week (SL5 migration, site upgrades)
  • Identify best data access method per site by comparing the event rate and CPU/Walltime

https://twiki.cern.ch/twiki/bin/view/Atla/HammerCloudDataAccess#FR_cloud

  • Exercise Analysis with Conditions DB access (see where squid caching is needed) and Tag analysis

Data Access methods

Multiple data access methods are exercised

  • via Panda : A copy-to-WN access mode using rfcp is used (xrootd in ANALY-LYON)
  • via gLite WMS : 2 data access modes available
    • DQ2_LOCAL mode is a direct access mode using rfio or dcap
    • FILE_STAGER mode : data staged in by a dedicated thread running in // with Athena

Week 40

29/09/09 Test 649

  Bad efficiency - all sites affected all sites 
  Failed jobs with error : exit code 1137
  Put error: Error in copying the file from job workdir to localSE
  due to LFC ACL problem : write permissions in /grid/atlas/users/pathena
  for pilot jobs /atlas/Role=pilot and /atlas/fr/Role=pilot (newly activated)

30/09/09 Test 652, 653, 656, 657

http://lcg.in2p3.fr/wiki/images/ATLAS-HC300909.gif

Week 41

08/10/09 Test 663

  • DPD Analysis (Release 15.5.0)
  • Input DS - DATADISK : data09_cos.*.DPD*
  • Cond DB access to Oracle in Lyon T1
  • via Panda (mode copy-to-WN using ddcp/rfcp - xrootd in ANALY-LYON)
  • Sites problems or downtime :
    • LAL : downtime
    • RO : DS unavailable
    • LYON (T2) : release 15.5.0 unavalaible
  Poor performance for foreign sites : Tokyo and Beijing compared to other french sites

http://lcg.in2p3.fr/wiki/images/HC663-081009-GRIF-Irfu-CPU.png http://lcg.in2p3.fr/wiki/images/HC663-081009-GRIF-Irfu-rate.png http://lcg.in2p3.fr/wiki/images/HC663-081009-Tokyo-CPU.png http://lcg.in2p3.fr/wiki/images/HC663-081009-Tokyo-rate.png

Week 42

15/10/09 Test 682

  • Muon Analysis (Release 15.3.1)
  • Input DS (STEP09) : mc08.*merge.AOD.e*_s*_r6*tid*
  • via Panda (mode copy-to-WN using ddcp/rfcp - xrootd in ANALY-LYON)
  • http://gangarobot.cern.ch/hc/682/test/
  • Sites problems or results to be followed-up :
    • GRIF-IRFU : failures due to release 15.3.1 installation (has run fine last week but needed to be patched)
  gcc version 4.1.2 used instead of gcc 3.4...
  See GGUS ticket 52483
    • LYON (T2) : both queues ANALY_LYON (xrootd) and ANALY_LYON_DCACHE exercised at the same time
  limitation due to BQS resource (u_xrootd_lhc)  
  ANALY_LYON_DCACHE : limited number of jobs but good performances (effect of dcache upgrade ?)
  ANALY_LYON : many failures - problems followed by J.Y Nief (root version used by ATLAS ?

Recent talks