Difference between revisions of "Atlas:Analysis Challenge"

Latest revision as of 18:06, 12 mars 2009

21/01/2009 : E.Lançon, F.Chollet
Thanks to Johannes Elmsheuser, Dan van der Ster, Cédric Serfon

Sommaire

1 Information & Contact
2 Goals
3 Required services @ T1 and GRIF
4 Target and metrics
5 FR Cloud ST (2009 plans)
6 Test report sent to ATLAS (D.van der Ster and J. Elmsheuser) Jan.2009
7 First exercise on the FR Cloud (December 2008)

Information & Contact

Mailing list ATLAS-LCG-OP-L@in2p3.fr

Goals

measure "real" analysis job efficiency and turn around on several sites of a given cloud
measure data access performance
check load balancing between different users and different analysis tools (Ganga vs pAthena)
check load balancing between analysis and MC production

Required services @ T1 and GRIF

LFC catalog : lfc-prod.in2p3.fr
ATLAS Disk space : ATLASUSERDISK on T1 SE (fail-over for outputs in case of problems with T2 disk storage)
GRIF and CC-IN2P3 TOP BDII : should be available as they are used remotely by some sites

Target and metrics

Nb of events : Few hundred up to 1000 jobs/site
Rate (evt/s) : up to 15 Hz
Efficiency (success/failure rate) : 80 %
CPU utilization : CPUtime / Walltime > 50 %

FR Cloud ST (2009 plans)

See http://lcg.in2p3.fr/wiki/index.php/Atlas:Analysis_ST_2009

Test report sent to ATLAS (D.van der Ster and J. Elmsheuser) Jan.2009

http://lcg.in2p3.fr/wiki/skins/common/images/icons/fileicon-pdf.png

First exercise on the FR Cloud (December 2008)

Phase 1 : Site stress test organized by ATLAS and run centrally in a controlled manner (2 days)

DA challenges have been performed on IT and DE clouds in october 08. Proposition has been made to extend this cloud-by cloud challenge to the FR Cloud. See ATLAS coordination DA challenge meeting (Nov. 20)

http://indico.cern.ch/conferenceDisplay.py?confId=45718

First exercise will help to identify breaking points and bottlenecks. It is limited in time (a few days) and requires careful attention of site administrators during that period,in particular network (internal & external), disk, cpu monitoring. This first try (Stress tests) can be run centrally in a controlled manner. The testing framework is ganga-based.

Any site in the Tiers_of_ATLAS list can participate.
ATLAS coordination : Dan van der Ster and Johannes Elmsheuser
Details of Site Stress test : procedure, test conditions and targets
GlueCEPolicyMaxCPUTime >= 1440 (1 day , typical duration : 5 hours)
Jobs run under DN : /O=GermanGrid/OU=LMU/CN=Johannes_Elmsheuser
Results available here : http://gangarobot.cern.ch/st/
Test 43 - Nov. 28 http://atlas-ganga-storage.cern.ch/test_43/
Test 61 - December 8-10 http://atlas-ganga-storage.cern.ch/test_61/

Sites: IN2P3-LPC, GRIF-LPNHE, TOKYO-LCG2, IN2P3-CPPM 
Max Jobs Per Site: 300

Test 62 - December 8-10 http://atlas-ganga-storage.cern.ch/test_62/

Sites: TOKYO
Max Jobs Per Site: 300
Input DS Patterns: mc08.*Wmunu*.recon.AOD.e*_s*_r5*tid* 
                   mc08.*Zprime_mumu*.recon.AOD.e*_s*_r5*tid* 
                   mc08.*Zmumu*.recon.AOD.e*_s*_r5*tid* 
                   mc08.*T1_McAtNlo*.recon.AOD.e*_s*_r5*tid* 
                   mc08.*H*zz4l*.recon.AOD.e*_s*_r5*tid* 
                   mc08.*.recon.AOD.e*_s*_r5*tid*

Test 82 - December 15-17 http://atlas-ganga-storage.cern.ch/test_82/

ST 2008 summary

Phase 2 : Pathena Analysis Challenge

Data Analysis exercice open to physicists with their favorite application
Physicists involved : Julien Donini, Arnaud Lucotte, Bertrand Brelier, Eric Lançon, LAL ?, LPNHE ?

Dec. 2008 Planning

Dec 8 : stop of MC production
Dec. 8-9: 1rst round with Tokyo, CPPM, LPC (LAN limited to 1Gbps), GRIF-LPNHE
Dec 17 : restart of MC production
Dec 14 : stop of MC production
Dec. 15-16 : 2nd round with LAPP, CC-IN2P3-T2 (to be confirmed), Tokyo, CPPM, LPC, possibly GRIF (SACLAY, IRFU, LPNHE), RO-07 and RO-02
Dec 17 : restart of MC production
Dec 17 : Beginning of Analysis Challenge (Phase 2)

@@ Ligne 1: / Ligne 1: @@
-== ATLAS Analysis challenge  ==
+/01/2009 : ''E.Lançon, F.Chollet <br>
+Thanks to Johannes Elmsheuser, Dan van der Ster, Cédric Serfon''
+====Information & Contact ====
+Mailing list ATLAS-LCG-OP-L@in2p3.fr
 ==== Goals ====
@@ Ligne 7: / Ligne 11: @@
 * check load balancing between analysis and MC production
-==== First exercise on the FR Cloud in december 08 (proposition) ====
+==== Required services @ T1 and GRIF====
+* LFC catalog :  lfc-prod.in2p3.fr
+* ATLAS Disk space : ATLASUSERDISK on T1 SE (fail-over for outputs in case of problems with T2 disk storage)
+* GRIF and CC-IN2P3 TOP BDII : should be available as they are used remotely by some sites
+==== Target and metrics ====
+* Nb of events : Few hundred up to 1000 jobs/site
+* Rate (evt/s) : up to 15 Hz
+* Efficiency (success/failure rate) : 80 %
+* CPU utilization :  CPUtime / Walltime > 50 %
+==== FR Cloud ST (2009 plans) ====
+* See http://lcg.in2p3.fr/wiki/index.php/Atlas:Analysis_ST_2009
+====Test report sent to ATLAS (D.van der Ster and J. Elmsheuser) Jan.2009 ====
+[http://lcg.in2p3.fr/wiki/images/FR-ST82-Feedback.pdf http://lcg.in2p3.fr/wiki/skins/common/images/icons/fileicon-pdf.png]
+==== First exercise on the FR Cloud (December 2008) ====
+=====  Phase 1 : Site stress test organized by ATLAS and run centrally in a controlled manner (2 days) =====
 DA challenges have been performed on IT and DE clouds in october 08.
 Proposition has been made to extend this cloud-by cloud challenge to the FR Cloud.
 See  ATLAS coordination DA challenge meeting (Nov. 20)
 * http://indico.cern.ch/conferenceDisplay.py?confId=45718
+First exercise will help to identify breaking points and bottlenecks. <b>It is limited in time (a few days) and requires careful attention of site administrators during that period,in particular network (internal & external), disk, cpu monitoring.</b>
+This first try (Stress tests) can be run centrally in a controlled manner. The testing framework is ganga-based.
+* Any site in the Tiers_of_ATLAS list can participate.
+* ATLAS coordination : Dan van der Ster and Johannes Elmsheuser
+* [http://lcg.in2p3.fr/wiki/index.php/Atlas:Analysis_Challenge_ST  Details of Site Stress test] : procedure, test conditions and targets
+* GlueCEPolicyMaxCPUTime >= 1440 (1 day , typical duration : 5 hours)
+* Jobs run under DN : /O=GermanGrid/OU=LMU/CN=Johannes_Elmsheuser
+* <b>Results </b> available here : http://gangarobot.cern.ch/st/
+* <b>Test 43 - Nov. 28 </b>  http://atlas-ganga-storage.cern.ch/test_43/
+* <b>Test 61 - December 8-10</b> http://atlas-ganga-storage.cern.ch/test_61/
+ Sites: IN2P3-LPC, GRIF-LPNHE, TOKYO-LCG2, IN2P3-CPPM
+ Max Jobs Per Site: 300
+* <b>Test 62 - December 8-10</b> http://atlas-ganga-storage.cern.ch/test_62/
+ Sites: TOKYO
+ Max Jobs Per Site: 300
+ Input DS Patterns: mc08.*Wmunu*.recon.AOD.e*_s*_r5*tid*
+                    mc08.*Zprime_mumu*.recon.AOD.e*_s*_r5*tid*
+                    mc08.*Zmumu*.recon.AOD.e*_s*_r5*tid*
+                    mc08.*T1_McAtNlo*.recon.AOD.e*_s*_r5*tid*
+                    mc08.*H*zz4l*.recon.AOD.e*_s*_r5*tid*
+                    mc08.*.recon.AOD.e*_s*_r5*tid*
+* <b>Test 82 - December 15-17</b> http://atlas-ganga-storage.cern.ch/test_82/
-First exercise will help to identify breaking points and bottlenecks. It is limited in time (a few days) and requires careful attention of site administrators during that period,in particular network (internal & external), disk, cpu monitoring.
+=====[http://lcg.in2p3.fr/wiki/index.php/Atlas:Analysis_Challenge-STsummary  ST 2008 summary]=====
-This first try can be run centrally. ATLAS coordination (Dan van der Ster and Johannes Elmsheuser) needs to know which sites to be tested and when.
+===== Phase 2 : Pathena Analysis Challenge  =====
-==== Procedure ====
+* Data Analysis exercice open to physicists with their favorite application
-* Replication of target datasets accross the cloud
+* Physicists involved : Julien Donini, Arnaud Lucotte, Bertrand Brelier, Eric Lançon, LAL ?, LPNHE ?
-* Preparation of job
-* Generation n jobs per site (Each job processes 1 dataset)
-* Bulk submission to WMS (1 per site)
-==== Test conditions ====
-The testing framework is ganga-based. It is currently using LCG backend but it will soon be possible to use PANDA backend as well. Both POSIX I/O and "copy mode" may be used allowing performances comparaison of the 2 modes. <br>
-It uses real analysis code (typ. AOD muon analysis) <br>
-Input datasets are read from ATLASMCDISK and outputs are stored on ATLASUSERDISK (no special requirements there). Input data access is the main issue. No problem on data output <br>
-Required CPUtime :~ 1 day (typical job duration 5 h)<br>
-LAN saturation observed at ~ 3 Hz in case of 1 Gb network connection between WN and SE. <br>
-http://gangarobot.cern.ch/
-<b>Participation required at cloud and site level. Any site in the Tiers_of_ATLAS list can participate.</b>
- It is possible for sites to limit the number of jobs sent at a time.
- DA team is ready to take into account site constraints.
- DA team is open to any metrics
-=== Target and metrics ===
-* Nb of events : Few hundred up to 1000 jobs/site
-* Rate (evt/s) : up to 15 Hz
-* Efficiency (success/failure rate) : 80 %
-* CPU utilization :  CPUtime / Walltime > 50 %
-=== Results ===
+===== Dec. 2008 Planning =====
-See
-* ATLAS Twiki page : https://twiki.cern.ch/twiki/bin/view/Main/GangaSiteTests
-* Results of analysis challenge performed on [http://indico.cern.ch/getFile.py/access?contribId=128&sessionId=8&resId=0&materialId=slides&confId=22137 IT Cloud]
-* Results of [http://indico.cern.ch/getFile.py/access?contribId=129&sessionId=8&resId=0&materialId=slides&confId=22137 DE Cloud]
-=== 2009 plans ===
+* Dec 8 : stop of MC production
+* Dec. 8-9: <b>1rst round </b>with Tokyo, CPPM, LPC (LAN limited to 1Gbps), GRIF-LPNHE
+* Dec 17 : restart of MC production
+* Dec 14 : stop of MC production
+* Dec. 15-16 : <b>2nd round </b>with LAPP, CC-IN2P3-T2 (to be confirmed), Tokyo, CPPM, LPC, possibly GRIF (SACLAY, IRFU, LPNHE), RO-07 and RO-02
+* Dec 17 : <b> restart of MC production</b>
+* Dec 17 : <b> Beginning of Analysis Challenge (Phase 2) </b>

Difference between revisions of "Atlas:Analysis Challenge"

Latest revision as of 18:06, 12 mars 2009

Sommaire

Information & Contact

Goals

Required services @ T1 and GRIF

Target and metrics

FR Cloud ST (2009 plans)

Test report sent to ATLAS (D.van der Ster and J. Elmsheuser) Jan.2009

First exercise on the FR Cloud (December 2008)

Phase 1 : Site stress test organized by ATLAS and run centrally in a controlled manner (2 days)

ST 2008 summary

Phase 2 : Pathena Analysis Challenge

Dec. 2008 Planning

Navigation menu

Outils personnels

Namespaces

Variants

Views

More

Rechercher

LCG Project

Wiki Tools

Tools