Atlas:Analysis Challenge

The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

02/12/08 : E.Lançon, F.Chollet (Thanks to Cédric Serfon)

Information & Contact

Mailing list ATLAS-LCG-OP-L@in2p3.fr

Goals

measure "real" analysis job efficiency and turn around on several sites of a given cloud
measure data access performance
check load balancing between different users and different analysis tools (Ganga vs pAthena)
check load balancing between analysis and MC production

Required services @ T1

LFC catalog : lfc-prod.in2p3.fr
ATLAS Disk space : ATLASUSERDISK on T1 SE (fail-over for outputs in case of problems with T2 disk storage)

First exercise on the FR Cloud (>= December 8th )

Phase 1 : Site stress test run centrally in a controlled manner (2 days)

DA challenges have been performed on IT and DE clouds in october 08. Proposition has been made to extend this cloud-by cloud challenge to the FR Cloud. See ATLAS coordination DA challenge meeting (Nov. 20)

http://indico.cern.ch/conferenceDisplay.py?confId=45718

First exercise will help to identify breaking points and bottlenecks. It is limited in time (a few days) and requires careful attention of site administrators during that period,in particular network (internal & external), disk, cpu monitoring. This first try (Stress tests) can be run centrally in a controlled manner. The testing framework is ganga-based.

Participation required at cloud and site level. Any site in the Tiers_of_ATLAS list can participate. ATLAS coordination (Dan van der Ster and Johannes Elmsheuser) needs to know which sites to be tested and when.

It is possible for sites to limit the number of jobs sent at a time. 
DA team is ready to take into account site constraints.
DA team is open to any metrics

Details of Site Stress test : procedure, test conditions and targets

GlueCEPolicyMaxCPUTime >= 1440 ( 1 day)

Results
Results available here : http://gangarobot.cern.ch/st/

Validation
Nov. 28 Submission (Tot 200 jobs sur 12 sites): http://gangarobot.cern.ch/st/test_43/

Start Time: 2008-11-28 13:00:00
End Time: 2008-11-28 21:00:00 (test interrupted after 8 hours). Some jobs remain in "running" state.
Job status : c (Completed) / r (Running) / f (Failed)
o|e : links to stdout (o) et stderr (e)

Problems seen : 
- Missing ATLAS release 14.2.20 @ CPPM 
  02/12/08 [DONE] validate-prod of 14.2.20 at marce01.in2p3.fr (IN2P3-CPPM)
- Mapping CE-SE@GRIF-LAL and GRIF-SACLAY : Publication of multiple CloseSEs 
  by LAL and IRFU cf https://savannah.cern.ch/bugs/index.php?44824
- LFC bulk reading...End of LFC bulk reading...ERROR: Dataset(s)...is/are
  empty at GRIF-SACLAY_MCDISK / LAL_MCDISK 
 (could be related to same problem)
  Problem tracked
- ERROR@RO7-NIPNE : failed jobs with Athena errors 
  NON FATAL ERRORS (completed jobs) @RO7-NIPNE, RO2-NIPNE : related to
  ATLASUSERDISK space token configuration
  ERROR during execution of lcg-cr --vo atlas -s ATLASUSERDISK
  ERROR: file not saved to RO-07-NIPNE_USERDISK in attempt number 3 ...
  ERROR: file not saved to RO-07-NIPNE_USERDISK - using now IN2P3-CC_USERDISK
  Problem tracked

Planning

- Dec. 8-9: 1rst round with Tokyo, CPPM, LPC (LAN limited to 1Gbps)
- Dec 14 :  stop of MC production 
- Dec. 15-16 : 2nd round with LAPP, CC-IN2P3-T2(to be confirmed), Tokyo, 
  CPPM, LPC, possibly GRIF, RO-07 and RO-02 
- Dec 17 :  restart of MC production

Phase 2 : Pathena Analysis Challenge

- Dec 17-18 : Data Analysis exercice open to physicists with their favorite application

Physicists involved : Julien Donini, Arnaud Lucotte, Bertrand Brelier, Eric Lançon, LAL ?, LPNHE ?

Target and metrics

Nb of events : Few hundred up to 1000 jobs/site
Rate (evt/s) : up to 15 Hz
Efficiency (success/failure rate) : 80 %
CPU utilization : CPUtime / Walltime > 50 %

Atlas:Analysis Challenge

Sommaire

Information & Contact

Goals

Required services @ T1

First exercise on the FR Cloud (>= December 8th )

Phase 1 : Site stress test run centrally in a controlled manner (2 days)

Phase 2 : Pathena Analysis Challenge

Target and metrics

Navigation menu

Atlas:Analysis Challenge

Information & Contact

Goals

Required services @ T1

First exercise on the FR Cloud (>= December 8th )

Phase 1 : Site stress test run centrally in a controlled manner (2 days)

Phase 2 : Pathena Analysis Challenge

Target and metrics

Navigation menu

Rechercher