Atlas:Analysis Challenge

02/12/08 : E.Lançon, F.Chollet (Thanks to Cédric Serfon)

Sommaire

1 Information & Contact
2 Goals
3 Required services @ T1 and GRIF
4 FR Cloud ST (2009 plans)
5 First exercise on the FR Cloud (December 2008)
6 Planning
7 Target and metrics

Information & Contact

Mailing list ATLAS-LCG-OP-L@in2p3.fr

Goals

measure "real" analysis job efficiency and turn around on several sites of a given cloud
measure data access performance
check load balancing between different users and different analysis tools (Ganga vs pAthena)
check load balancing between analysis and MC production

Required services @ T1 and GRIF

LFC catalog : lfc-prod.in2p3.fr
ATLAS Disk space : ATLASUSERDISK on T1 SE (fail-over for outputs in case of problems with T2 disk storage)
GRIF and CC-IN2P3 TOP BDII : should be available as they are used remotely by some sites

FR Cloud ST (2009 plans)

Weekly ST : 50 jobs per site
Optimization and Site Stress test: On demand and dedicated to sites which are interrested.

GRIF : GRIFOPN test (as soon as it is available) Total of 1000 jobs will
be distributed to ATLAS CEs with following DS available being accessed
from LAL, SACLAY and LPNHE
       - GRIF-SACLAY_MCDISK: 332
       - GRIF-LPNHE_MCDISK: 334
       - GRIF-LAL_MCDISK: 457
LPC : to be planned when new storage arrays installation and network  
      upgrade done
      DS available on IN2P3-LPC_MCDISK: 326
LAPP-CPPM : interrested by a stress test ~130 DS available there
LPSC : ready to be tested but only 16 DS available 
TOKYO : Input DS available on TOKYO-LCG2_MCDISK: 1000
BEIJING : Validation on-going - Available DS on MCDISK 68

First exercise on the FR Cloud (December 2008)

Phase 1 : Site stress test oraganized by ATLAS and run centrally in a controlled manner (2 days)

DA challenges have been performed on IT and DE clouds in october 08. Proposition has been made to extend this cloud-by cloud challenge to the FR Cloud. See ATLAS coordination DA challenge meeting (Nov. 20)

http://indico.cern.ch/conferenceDisplay.py?confId=45718

First exercise will help to identify breaking points and bottlenecks. It is limited in time (a few days) and requires careful attention of site administrators during that period,in particular network (internal & external), disk, cpu monitoring. This first try (Stress tests) can be run centrally in a controlled manner. The testing framework is ganga-based.

Any site in the Tiers_of_ATLAS list can participate.
ATLAS coordination : Dan van der Ster and Johannes Elmsheuser
Details of Site Stress test : procedure, test conditions and targets
GlueCEPolicyMaxCPUTime >= 1440 (1 day , typical duration : 5 hours)
Jobs run under DN : /O=GermanGrid/OU=LMU/CN=Johannes_Elmsheuser
Results available here : http://gangarobot.cern.ch/st/
Test 43 - Nov. 28 http://gangarobot.cern.ch/st/test_43/
Test 61 - December 8-10 http://gangarobot.cern.ch/st/test_61/

Sites: IN2P3-LPC, GRIF-LPNHE, TOKYO-LCG2, IN2P3-CPPM 
Max Jobs Per Site: 300

Test 62 - December 8-10 http://gangarobot.cern.ch/st/test_62/

Sites: TOKYO
Max Jobs Per Site: 300
Input DS Patterns: mc08.*Wmunu*.recon.AOD.e*_s*_r5*tid* 
                   mc08.*Zprime_mumu*.recon.AOD.e*_s*_r5*tid* 
                   mc08.*Zmumu*.recon.AOD.e*_s*_r5*tid* 
                   mc08.*T1_McAtNlo*.recon.AOD.e*_s*_r5*tid* 
                   mc08.*H*zz4l*.recon.AOD.e*_s*_r5*tid* 
                   mc08.*.recon.AOD.e*_s*_r5*tid*

Test 82 - December 15-17 http://gangarobot.cern.ch/st/test_82/

ST summary

Phase 2 : Pathena Analysis Challenge

Data Analysis exercice open to physicists with their favorite application
Physicists involved : Julien Donini, Arnaud Lucotte, Bertrand Brelier, Eric Lançon, LAL ?, LPNHE ?

Planning

Dec 8 : stop of MC production
Dec. 8-9: 1rst round with Tokyo, CPPM, LPC (LAN limited to 1Gbps), GRIF-LPNHE
Dec 17 : restart of MC production
Dec 14 : stop of MC production
Dec. 15-16 : 2nd round with LAPP, CC-IN2P3-T2 (to be confirmed), Tokyo, CPPM, LPC, possibly GRIF (SACLAY, IRFU, LPNHE), RO-07 and RO-02
Dec 17 : restart of MC production
Dec 17 : Beginning of Analysis Challenge (Phase 2)

Target and metrics

Nb of events : Few hundred up to 1000 jobs/site
Rate (evt/s) : up to 15 Hz
Efficiency (success/failure rate) : 80 %
CPU utilization : CPUtime / Walltime > 50 %

Atlas:Analysis Challenge

Sommaire

Information & Contact

Goals

Required services @ T1 and GRIF

FR Cloud ST (2009 plans)

First exercise on the FR Cloud (December 2008)

Phase 1 : Site stress test oraganized by ATLAS and run centrally in a controlled manner (2 days)

ST summary

Phase 2 : Pathena Analysis Challenge

Planning

Target and metrics

Navigation menu

Outils personnels

Namespaces

Variants

Views

More

Rechercher

LCG Project

Wiki Tools

Tools