Difference between revisions of "Atlas:Analysis Challenge"
(→Phase 1 : Site stress test oraganized by ATLAS and run centrally in a controlled manner (2 days)) |
|||
(82 intermediate revisions by 2 users not shown) | |||
Ligne 1: | Ligne 1: | ||
+ | 21/01/2009 : ''E.Lançon, F.Chollet <br> | ||
+ | Thanks to Johannes Elmsheuser, Dan van der Ster, Cédric Serfon'' | ||
+ | |||
+ | ====Information & Contact ==== | ||
+ | Mailing list ATLAS-LCG-OP-L@in2p3.fr | ||
+ | |||
==== Goals ==== | ==== Goals ==== | ||
* measure "real" analysis job efficiency and turn around on several sites of a given cloud | * measure "real" analysis job efficiency and turn around on several sites of a given cloud | ||
Ligne 5: | Ligne 11: | ||
* check load balancing between analysis and MC production | * check load balancing between analysis and MC production | ||
− | ==== First exercise on the FR Cloud ( | + | ==== Required services @ T1 and GRIF==== |
− | ===== Phase 1 : Site stress test run centrally in a controlled manner (2 days) ===== | + | * LFC catalog : lfc-prod.in2p3.fr |
+ | * ATLAS Disk space : ATLASUSERDISK on T1 SE (fail-over for outputs in case of problems with T2 disk storage) | ||
+ | * GRIF and CC-IN2P3 TOP BDII : should be available as they are used remotely by some sites | ||
+ | |||
+ | ==== Target and metrics ==== | ||
+ | * Nb of events : Few hundred up to 1000 jobs/site | ||
+ | * Rate (evt/s) : up to 15 Hz | ||
+ | * Efficiency (success/failure rate) : 80 % | ||
+ | * CPU utilization : CPUtime / Walltime > 50 % | ||
+ | |||
+ | ==== FR Cloud ST (2009 plans) ==== | ||
+ | * See http://lcg.in2p3.fr/wiki/index.php/Atlas:Analysis_ST_2009 | ||
+ | |||
+ | ====Test report sent to ATLAS (D.van der Ster and J. Elmsheuser) Jan.2009 ==== | ||
+ | [http://lcg.in2p3.fr/wiki/images/FR-ST82-Feedback.pdf http://lcg.in2p3.fr/wiki/skins/common/images/icons/fileicon-pdf.png] | ||
+ | |||
+ | ==== First exercise on the FR Cloud (December 2008) ==== | ||
+ | |||
+ | ===== Phase 1 : Site stress test organized by ATLAS and run centrally in a controlled manner (2 days) ===== | ||
DA challenges have been performed on IT and DE clouds in october 08. | DA challenges have been performed on IT and DE clouds in october 08. | ||
Proposition has been made to extend this cloud-by cloud challenge to the FR Cloud. | Proposition has been made to extend this cloud-by cloud challenge to the FR Cloud. | ||
Ligne 12: | Ligne 36: | ||
* http://indico.cern.ch/conferenceDisplay.py?confId=45718 | * http://indico.cern.ch/conferenceDisplay.py?confId=45718 | ||
First exercise will help to identify breaking points and bottlenecks. <b>It is limited in time (a few days) and requires careful attention of site administrators during that period,in particular network (internal & external), disk, cpu monitoring.</b> | First exercise will help to identify breaking points and bottlenecks. <b>It is limited in time (a few days) and requires careful attention of site administrators during that period,in particular network (internal & external), disk, cpu monitoring.</b> | ||
− | This first try (Stress tests) can be run centrally in a controlled manner. The testing framework is ganga-based. ATLAS coordination | + | This first try (Stress tests) can be run centrally in a controlled manner. The testing framework is ganga-based. |
− | * | + | * Any site in the Tiers_of_ATLAS list can participate. |
− | * | + | * ATLAS coordination : Dan van der Ster and Johannes Elmsheuser |
− | * Nov. 28 | + | * [http://lcg.in2p3.fr/wiki/index.php/Atlas:Analysis_Challenge_ST Details of Site Stress test] : procedure, test conditions and targets |
− | * <b> | + | * GlueCEPolicyMaxCPUTime >= 1440 (1 day , typical duration : 5 hours) |
− | - | + | * Jobs run under DN : /O=GermanGrid/OU=LMU/CN=Johannes_Elmsheuser |
− | + | * <b>Results </b> available here : http://gangarobot.cern.ch/st/ | |
− | + | * <b>Test 43 - Nov. 28 </b> http://atlas-ganga-storage.cern.ch/test_43/ | |
− | + | * <b>Test 61 - December 8-10</b> http://atlas-ganga-storage.cern.ch/test_61/ | |
− | + | Sites: IN2P3-LPC, GRIF-LPNHE, TOKYO-LCG2, IN2P3-CPPM | |
− | + | Max Jobs Per Site: 300 | |
+ | * <b>Test 62 - December 8-10</b> http://atlas-ganga-storage.cern.ch/test_62/ | ||
+ | Sites: TOKYO | ||
+ | Max Jobs Per Site: 300 | ||
+ | Input DS Patterns: mc08.*Wmunu*.recon.AOD.e*_s*_r5*tid* | ||
+ | mc08.*Zprime_mumu*.recon.AOD.e*_s*_r5*tid* | ||
+ | mc08.*Zmumu*.recon.AOD.e*_s*_r5*tid* | ||
+ | mc08.*T1_McAtNlo*.recon.AOD.e*_s*_r5*tid* | ||
+ | mc08.*H*zz4l*.recon.AOD.e*_s*_r5*tid* | ||
+ | mc08.*.recon.AOD.e*_s*_r5*tid* | ||
+ | * <b>Test 82 - December 15-17</b> http://atlas-ganga-storage.cern.ch/test_82/ | ||
+ | |||
+ | =====[http://lcg.in2p3.fr/wiki/index.php/Atlas:Analysis_Challenge-STsummary ST 2008 summary]===== | ||
===== Phase 2 : Pathena Analysis Challenge ===== | ===== Phase 2 : Pathena Analysis Challenge ===== | ||
− | * | + | * Data Analysis exercice open to physicists with their favorite application |
* Physicists involved : Julien Donini, Arnaud Lucotte, Bertrand Brelier, Eric Lançon, LAL ?, LPNHE ? | * Physicists involved : Julien Donini, Arnaud Lucotte, Bertrand Brelier, Eric Lançon, LAL ?, LPNHE ? | ||
+ | ===== Dec. 2008 Planning ===== | ||
− | <b> | + | * Dec 8 : stop of MC production |
− | + | * Dec. 8-9: <b>1rst round </b>with Tokyo, CPPM, LPC (LAN limited to 1Gbps), GRIF-LPNHE | |
− | + | * Dec 17 : restart of MC production | |
− | + | * Dec 14 : stop of MC production | |
− | + | * Dec. 15-16 : <b>2nd round </b>with LAPP, CC-IN2P3-T2 (to be confirmed), Tokyo, CPPM, LPC, possibly GRIF (SACLAY, IRFU, LPNHE), RO-07 and RO-02 | |
− | + | * Dec 17 : <b> restart of MC production</b> | |
− | + | * Dec 17 : <b> Beginning of Analysis Challenge (Phase 2) </b> | |
− | * | ||
− | * | ||
− | |||
− |
Latest revision as of 17:06, 12 mars 2009
21/01/2009 : E.Lançon, F.Chollet
Thanks to Johannes Elmsheuser, Dan van der Ster, Cédric Serfon
Sommaire
Information & Contact
Mailing list ATLAS-LCG-OP-L@in2p3.fr
Goals
- measure "real" analysis job efficiency and turn around on several sites of a given cloud
- measure data access performance
- check load balancing between different users and different analysis tools (Ganga vs pAthena)
- check load balancing between analysis and MC production
Required services @ T1 and GRIF
- LFC catalog : lfc-prod.in2p3.fr
- ATLAS Disk space : ATLASUSERDISK on T1 SE (fail-over for outputs in case of problems with T2 disk storage)
- GRIF and CC-IN2P3 TOP BDII : should be available as they are used remotely by some sites
Target and metrics
- Nb of events : Few hundred up to 1000 jobs/site
- Rate (evt/s) : up to 15 Hz
- Efficiency (success/failure rate) : 80 %
- CPU utilization : CPUtime / Walltime > 50 %
FR Cloud ST (2009 plans)
Test report sent to ATLAS (D.van der Ster and J. Elmsheuser) Jan.2009
http://lcg.in2p3.fr/wiki/skins/common/images/icons/fileicon-pdf.png
First exercise on the FR Cloud (December 2008)
Phase 1 : Site stress test organized by ATLAS and run centrally in a controlled manner (2 days)
DA challenges have been performed on IT and DE clouds in october 08. Proposition has been made to extend this cloud-by cloud challenge to the FR Cloud. See ATLAS coordination DA challenge meeting (Nov. 20)
First exercise will help to identify breaking points and bottlenecks. It is limited in time (a few days) and requires careful attention of site administrators during that period,in particular network (internal & external), disk, cpu monitoring. This first try (Stress tests) can be run centrally in a controlled manner. The testing framework is ganga-based.
- Any site in the Tiers_of_ATLAS list can participate.
- ATLAS coordination : Dan van der Ster and Johannes Elmsheuser
- Details of Site Stress test : procedure, test conditions and targets
- GlueCEPolicyMaxCPUTime >= 1440 (1 day , typical duration : 5 hours)
- Jobs run under DN : /O=GermanGrid/OU=LMU/CN=Johannes_Elmsheuser
- Results available here : http://gangarobot.cern.ch/st/
- Test 43 - Nov. 28 http://atlas-ganga-storage.cern.ch/test_43/
- Test 61 - December 8-10 http://atlas-ganga-storage.cern.ch/test_61/
Sites: IN2P3-LPC, GRIF-LPNHE, TOKYO-LCG2, IN2P3-CPPM Max Jobs Per Site: 300
- Test 62 - December 8-10 http://atlas-ganga-storage.cern.ch/test_62/
Sites: TOKYO Max Jobs Per Site: 300 Input DS Patterns: mc08.*Wmunu*.recon.AOD.e*_s*_r5*tid* mc08.*Zprime_mumu*.recon.AOD.e*_s*_r5*tid* mc08.*Zmumu*.recon.AOD.e*_s*_r5*tid* mc08.*T1_McAtNlo*.recon.AOD.e*_s*_r5*tid* mc08.*H*zz4l*.recon.AOD.e*_s*_r5*tid* mc08.*.recon.AOD.e*_s*_r5*tid*
- Test 82 - December 15-17 http://atlas-ganga-storage.cern.ch/test_82/
ST 2008 summary
Phase 2 : Pathena Analysis Challenge
- Data Analysis exercice open to physicists with their favorite application
- Physicists involved : Julien Donini, Arnaud Lucotte, Bertrand Brelier, Eric Lançon, LAL ?, LPNHE ?
Dec. 2008 Planning
- Dec 8 : stop of MC production
- Dec. 8-9: 1rst round with Tokyo, CPPM, LPC (LAN limited to 1Gbps), GRIF-LPNHE
- Dec 17 : restart of MC production
- Dec 14 : stop of MC production
- Dec. 15-16 : 2nd round with LAPP, CC-IN2P3-T2 (to be confirmed), Tokyo, CPPM, LPC, possibly GRIF (SACLAY, IRFU, LPNHE), RO-07 and RO-02
- Dec 17 : restart of MC production
- Dec 17 : Beginning of Analysis Challenge (Phase 2)