Difference between revisions of "Atlas:Analysis HC beyond STEP09"

Un article de lcgwiki.
Jump to: navigation, search
Ligne 39: Ligne 39:
  
 
=== Week 40 ===
 
=== Week 40 ===
 +
 +
==== 29/09/09 ''<span style="color:#FF0000;">Test 649'' ====
 
* Muon Analysis (Release 15.3.1)   
 
* Muon Analysis (Release 15.3.1)   
 
* Input DS  (STEP09) : mc08.*merge.AOD.e*_s*_r6*tid*
 
* Input DS  (STEP09) : mc08.*merge.AOD.e*_s*_r6*tid*
* 3 HC tests of 24 hrs each : 
+
* via Panda (mode copy-to-WN using ddcp/rfcp - xrootd in ANALY-LYON)
==== 29/09/09 ''<span style="color:#FF0000;">Test 649'' ====
+
* http://gangarobot.cern.ch/hc/649/test/
* [http://gangarobot.cern.ch/hc/649/test/ HC 649] via Panda (mode copy-to-WN using ddcp/rfcp - xrootd in ANALY-LYON)<br>
 
 
   Bad efficiency - all sites affected all sites  
 
   Bad efficiency - all sites affected all sites  
 
   Failed jobs with error : exit code 1137
 
   Failed jobs with error : exit code 1137
Ligne 50: Ligne 51:
 
   for pilot jobs /atlas/Role=pilot and /atlas/fr/Role=pilot (newly activated)
 
   for pilot jobs /atlas/Role=pilot and /atlas/fr/Role=pilot (newly activated)
 
==== 30/09/09 ''<span style="color:#00FF00;">Test 652, 653, 656, 657'' ====  
 
==== 30/09/09 ''<span style="color:#00FF00;">Test 652, 653, 656, 657'' ====  
* [http://gangarobot.cern.ch/hc/652/test/ HC 652]/[http://gangarobot.cern.ch/hc/656/test/ 656] via WMS (DQ2_LOCAL mode or direct access dcap/rfio)<br>
+
* Muon Analysis (Release 15.3.1) 
* [http://gangarobot.cern.ch/hc/653/test/ HC 653]/[http://gangarobot.cern.ch/hc/657/test/ 657] via WMS (FILE_STAGER mode)<br>
+
* Input DS  (STEP09) : mc08.*merge.AOD.e*_s*_r6*tid*
 +
* via WMS
 +
* DQ2_LOCAL mode or direct access dcap/rfio : http://gangarobot.cern.ch/hc/652/test/
 +
* DQ2_LOCAL mode or direct access dcap/rfio : http://gangarobot.cern.ch/hc/656/test/
 +
* FILE_STAGER mode : http://gangarobot.cern.ch/hc/653/test/  
 +
* FILE_STAGER mode : http://gangarobot.cern.ch/hc/657/test/
 
http://lcg.in2p3.fr/wiki/images/ATLAS-HC300909.gif
 
http://lcg.in2p3.fr/wiki/images/ATLAS-HC300909.gif
  
 +
=== Week 41 ===
 +
 +
==== 08/10/09 ''<span style="color:#00FF00;">Test 663'' ====
 +
* DPD Analysis (Release 15.5.0)
 +
* Input DS - DATADISK : data09_cos.*.DPD*
 +
* '''Cond DB access to Oracle in Lyon T1'''
 +
* via Panda (mode copy-to-WN using ddcp/rfcp - xrootd in ANALY-LYON)
 +
* Sites problems or downtime :
 +
** LAL : downtime
 +
** RO : DS unavailable
 +
** LYON (T2) : release 15.5.0 unavalaible
 +
  Poor performance for foreign sites : Tokyo and Beijing compared to other french sites
 +
http://lcg.in2p3.fr/wiki/images/HC663-081009-GRIF-Irfu-CPU.png
 +
http://lcg.in2p3.fr/wiki/images/HC663-081009-GRIF-Irfu-rate.png
 +
http://lcg.in2p3.fr/wiki/images/HC663-081009-Tokyo-CPU.png
 +
http://lcg.in2p3.fr/wiki/images/HC663-081009-Tokyo-rate.png
 
== Recent talks ==
 
== Recent talks ==
 
* [http://indico.in2p3.fr/getFile.py/access?contribId=6&sessionId=30&resId=0&materialId=slides&confId=2110 ATLAS : from STEP09 towards first beams] Graeme Stewart's talk@Journées Grille France (16 October 2009)
 
* [http://indico.in2p3.fr/getFile.py/access?contribId=6&sessionId=30&resId=0&materialId=slides&confId=2110 ATLAS : from STEP09 towards first beams] Graeme Stewart's talk@Journées Grille France (16 October 2009)
 
* [http://indico.cern.ch/getFile.py/access?contribId=8&sessionId=2&resId=0&materialId=slides&confId=66012 Summary of HammerCloud Tests since STEP09] Dan van der Ster's talk@ATLAS Jamboree T1/T2/T3 (13 October 2009)
 
* [http://indico.cern.ch/getFile.py/access?contribId=8&sessionId=2&resId=0&materialId=slides&confId=66012 Summary of HammerCloud Tests since STEP09] Dan van der Ster's talk@ATLAS Jamboree T1/T2/T3 (13 October 2009)
 
* [http://indico.cern.ch/getFile.py/access?contribId=31&sessionId=16&resId=0&materialId=slides&confId=50976 HammerCloud Plans] Johannes Elmsheuser's talk@ATLAS S&C Week (2 September 2009)
 
* [http://indico.cern.ch/getFile.py/access?contribId=31&sessionId=16&resId=0&materialId=slides&confId=50976 HammerCloud Plans] Johannes Elmsheuser's talk@ATLAS S&C Week (2 September 2009)

Version du 15:33, 23 octobre 2009

--Chollet 15:56, 19 octobre 2009 (CEST)

Distributed Analysis Stress Tests - HammerCloud beyond STEP09

Lessons learnt from STEP09

  • Sites may identify reasonable amount of analysis they can assume and set hard limits on number of analysis running jobs
  • Balancing data across many disk servers is essential.
  • Very high i/o required by analysis (5 MB/s per job). Sites should review LAN architecture to avoid bottlenecks.

Results

ATLAS Info & Contacts

  • Information via mailing list ATLAS-LCG-OP-L@in2p3.fr
  • LPC : Nabil Ghodbane - Nabil.Ghodbane@cern.ch
  • LAL : Nicolas Makovec
  • LAPP : Stéphane Jézéquel
  • CPPM : Emmanuel Le Guirriec
  • LPSC : Sabine Crepe
  • LPNHE : Tristan Beau
  • CC-T2 : Catherine, Ghita
  • IRFU : Nathalie Besson

HC Tests

ATLAS-HC-small.jpg

Objectives

  • Improve Cloud readiness by following site&ATLAS problems week by week (SL5 migration, site upgrades)
  • Identify best data access method per site by comparing the event rate and CPU/Walltime

https://twiki.cern.ch/twiki/bin/view/Atla/HammerCloudDataAccess#FR_cloud

  • Exercise Analysis with Conditions DB access (see where squid caching is needed) and Tag analysis

Data Access methods

Multiple data access methods are exercised

  • via Panda : A copy-to-WN access mode using rfcp is used (xrootd in ANALY-LYON)
  • via gLite WMS : 2 data access modes available
    • DQ2_LOCAL mode is a direct access mode using rfio or dcap
    • FILE_STAGER mode : data staged in by a dedicated thread running in // with Athena

Week 40

29/09/09 Test 649

  Bad efficiency - all sites affected all sites 
  Failed jobs with error : exit code 1137
  Put error: Error in copying the file from job workdir to localSE
  due to LFC ACL problem : write permissions in /grid/atlas/users/pathena
  for pilot jobs /atlas/Role=pilot and /atlas/fr/Role=pilot (newly activated)

30/09/09 Test 652, 653, 656, 657

http://lcg.in2p3.fr/wiki/images/ATLAS-HC300909.gif

Week 41

08/10/09 Test 663

  • DPD Analysis (Release 15.5.0)
  • Input DS - DATADISK : data09_cos.*.DPD*
  • Cond DB access to Oracle in Lyon T1
  • via Panda (mode copy-to-WN using ddcp/rfcp - xrootd in ANALY-LYON)
  • Sites problems or downtime :
    • LAL : downtime
    • RO : DS unavailable
    • LYON (T2) : release 15.5.0 unavalaible
  Poor performance for foreign sites : Tokyo and Beijing compared to other french sites

http://lcg.in2p3.fr/wiki/images/HC663-081009-GRIF-Irfu-CPU.png http://lcg.in2p3.fr/wiki/images/HC663-081009-GRIF-Irfu-rate.png http://lcg.in2p3.fr/wiki/images/HC663-081009-Tokyo-CPU.png http://lcg.in2p3.fr/wiki/images/HC663-081009-Tokyo-rate.png

Recent talks