Difference between revisions of "Atlas:Analysis Challenge ST"
(→Test conditions) |
|||
(20 intermediate revisions by the same user not shown) | |||
Ligne 1: | Ligne 1: | ||
<b> Site Stress Test </b> | <b> Site Stress Test </b> | ||
− | + | --[[User:Chollet|Chollet]] 12:49, 29 janvier 2009 (CET) | |
==== Procedure ==== | ==== Procedure ==== | ||
* Replication of target datasets accross the cloud | * Replication of target datasets accross the cloud | ||
Ligne 9: | Ligne 9: | ||
==== Test conditions ==== | ==== Test conditions ==== | ||
− | The testing framework is ganga-based. It is currently using LCG backend but it will soon be possible to use PANDA backend as well. Both POSIX I/O and "copy mode" may be used allowing performances comparaison of the 2 modes. <br> | + | * The testing framework is ganga-based. It is currently using LCG backend but it will soon be possible to use PANDA backend as well. Metrics are collected and displayed at http://gangarobot.cern.ch/st/ |
− | + | * Both POSIX I/O and "copy mode" may be used allowing performances comparaison of the 2 modes. <br> | |
− | Input datasets are read from ATLASMCDISK and outputs are stored on ATLASUSERDISK (no special requirements there). Input data access is the main issue. No problem on data output <br> | + | * ATLAS software release 14.2.20 |
− | Required CPUtime : | + | * Input DS Patterns used (first one is the preferred one for muon analysis): |
− | LAN saturation observed | + | mc08.*Wmunu*.recon.AOD.e*_s*_r5*tid* |
− | + | mc08.*Zprime_mumu*.recon.AOD.e*_s*_r5*tid* | |
− | * | + | mc08.*Zmumu*.recon.AOD.e*_s*_r5*tid* |
− | + | mc08.*T1_McAtNlo*.recon.AOD.e*_s*_r5*tid* | |
− | + | mc08.*H*zz4l*.recon.AOD.e*_s*_r5*tid* | |
− | + | mc08.*.recon.AOD.e*_s*_r5*tid* (this pattern includes all the previous ones) | |
− | + | * Input datasets are read from ATLASMCDISK and outputs are stored on ATLASUSERDISK (no special requirements there). Input data access is the main issue. No problem on data output <br> | |
− | * | + | * Required CPUtime : GlueCEPolicyMaxCPUTime >= 1440 (1 day , typical duration : 5 hours) |
+ | * Jobs run under DN : /O=GermanGrid/OU=LMU/CN=Johannes_Elmsheuser | ||
+ | * LAN saturation observed in case of 1 Gb network connection between WN and SE. <br> | ||
+ | * It is possible for sites to limit the number of jobs sent at a time. | ||
+ | * Test duration : 48 hours | ||
− | + | === File access modes used by Ganga Jobs === | |
− | + | Two access modes may be used during ST tests. | |
− | + | * '''(j.inputdata.type='DQ2_LOCAL')''' : Currently if users are submitting Ganga jobs with the LCG backend, the posix I/O access is the default method to access files | |
− | + | * '''(j.inputdata.type='FILE_STAGER')''' : One of the alternative access mode used by ST tests is the FileStager mode, which copies the input files in a background thread of the athena event loop usinglcg-cp from the local SE to the worker node tmp area. This mode still needs a bit of improvements to gain full stability, but as demonstrated good performance during ST at some (but not all) sites. | |
=== Target and metrics === | === Target and metrics === | ||
− | * Nb of | + | * Nb of jobs : Few hundred up to 1000 jobs/site |
* Rate (evt/s) : up to 15 Hz | * Rate (evt/s) : up to 15 Hz | ||
− | * | + | * Success rate (success/failure rate) > 80 % |
* CPU utilization : CPUtime / Walltime > 50 % | * CPU utilization : CPUtime / Walltime > 50 % | ||
− | === Results === | + | === Results and Monitoring === |
− | See | + | * See http://gangarobot.cern.ch/st/ |
− | |||
− | |||
− |
Latest revision as of 12:49, 29 janvier 2009
Site Stress Test --Chollet 12:49, 29 janvier 2009 (CET)
Sommaire
Procedure
- Replication of target datasets accross the cloud
- Preparation of job
- Generation n jobs per site (Each job processes 1 dataset)
- Bulk submission to WMS (1 per site)
Test conditions
- The testing framework is ganga-based. It is currently using LCG backend but it will soon be possible to use PANDA backend as well. Metrics are collected and displayed at http://gangarobot.cern.ch/st/
- Both POSIX I/O and "copy mode" may be used allowing performances comparaison of the 2 modes.
- ATLAS software release 14.2.20
- Input DS Patterns used (first one is the preferred one for muon analysis):
mc08.*Wmunu*.recon.AOD.e*_s*_r5*tid* mc08.*Zprime_mumu*.recon.AOD.e*_s*_r5*tid* mc08.*Zmumu*.recon.AOD.e*_s*_r5*tid* mc08.*T1_McAtNlo*.recon.AOD.e*_s*_r5*tid* mc08.*H*zz4l*.recon.AOD.e*_s*_r5*tid* mc08.*.recon.AOD.e*_s*_r5*tid* (this pattern includes all the previous ones)
- Input datasets are read from ATLASMCDISK and outputs are stored on ATLASUSERDISK (no special requirements there). Input data access is the main issue. No problem on data output
- Required CPUtime : GlueCEPolicyMaxCPUTime >= 1440 (1 day , typical duration : 5 hours)
- Jobs run under DN : /O=GermanGrid/OU=LMU/CN=Johannes_Elmsheuser
- LAN saturation observed in case of 1 Gb network connection between WN and SE.
- It is possible for sites to limit the number of jobs sent at a time.
- Test duration : 48 hours
File access modes used by Ganga Jobs
Two access modes may be used during ST tests.
- (j.inputdata.type='DQ2_LOCAL') : Currently if users are submitting Ganga jobs with the LCG backend, the posix I/O access is the default method to access files
- (j.inputdata.type='FILE_STAGER') : One of the alternative access mode used by ST tests is the FileStager mode, which copies the input files in a background thread of the athena event loop usinglcg-cp from the local SE to the worker node tmp area. This mode still needs a bit of improvements to gain full stability, but as demonstrated good performance during ST at some (but not all) sites.
Target and metrics
- Nb of jobs : Few hundred up to 1000 jobs/site
- Rate (evt/s) : up to 15 Hz
- Success rate (success/failure rate) > 80 %
- CPU utilization : CPUtime / Walltime > 50 %