Atlas:CCRC08May
Sommaire
T0-T1 transfer tests (week1)
T1-T1 transfer tests (week2)
T0-T1-T2 transfer tests (week3)
General remarks
https://twiki.cern.ch/twiki/bin/view/Atlas/DDMOperationsGroup#CCRC08_2_May_2008
The T0 load generator will run at "peak" rate for 3 days ("peak" rate means data from 24h/day of detector data taking at 200Hz are distributed in 24h, while "nominal" rate means data from 14 hours/day of detector data taking at 200Hz are distributed in 24h).
At peak rate 17,280,000 events/day are produced, corresponding to 27.6 TB/day of RAW, 17.3 TB/day of ESD and 3.5 TB/day of AOD (considering the sizes of 1.6 MB/event for RAW, 1.0 MB/event for ESD and 0.2 MB/event for AOD)
T0-T1(LYON)
Shipping continuously data to T1s according to computing model, sites should demonstrate to sustain for 3 days the following export rates
SITE | TAPE | DISK | TOTAL |
IN2P3 | 48.00 MB/s | 100.00 MB/s | 148.00 MB/s |
Metric for success: Sites should be capable of sustaining 90% of the mentioned rates (for both disk and tape) for at least 2 days of test. For sites who would like to test higher throughput, we can oversubscribe (both to disk and tape).
As a reminder, here the table of the necessary space needed at each T1
SITE | TAPE | DISK |
IN2P3 | 12.4416 TB | 25.92 TB |
T1-T2
T2s will receive AODs, which should be generated at a rate of 3.5TB/day. The amount that each site receives depends on the share
- IN2P3-LAPP_DATADISK : 12%
- IN2P3-CPPM_DATADISK : 5%
- IN2P3-LPSC_DATADISK : 5 %
- IN2P3-LPC_DATADISK : 13%
- GRIF-LAL_DATADISK : 30%
- GRIF-LPNHE_DATADISK : 15%
- GRIF-SACLAY_DATADISK : 20 %
- BEIJING-LCG2_DATADISK : 20 %
- RO-07-NIPNE_DATADISK : 10%
- RO-02-NIPNE_DATADISK : 10%
- TOKYO-LCG2_DATADISK : 50%
as written in https://twiki.cern.ch/twiki/bin/view/Atlas/DDMOperationsGroup#CCRC08_2_May_2008
The shares are decided rather arbitrary according to the free space in ATLASDATADISK. These numers can be raised at a later stage of the test, but at first we would like to be sure everythinig goes well with this rate.
Current Status
T0-T1 (ALL) http://dashb-atlas-data-tier0.cern.ch/dashboard/request.py/site
- Throughput
- Errors
- http://dashb-atlas-data-tier0.cern.ch/dashboard/templates/plots/OVERVIEW.num_file_xs_error.14400.png
T0-T1 (Lyon) http://dashb-atlas-data-tier0.cern.ch/dashboard/request.py/site?statsInterval=4&name=LYON
- Throughput
- Errors
T1-T2 (Lyon) http://dashb-atlas-data.cern.ch/dashboard/request.py/site?statsInterval=4&name=LYON
- Throughput
- Errors
Logbook
- 21 May 20h20
dashb shows transfers, which seem to be successful but with many registration errors, in the table (not in the graph). Looking into the details, file states are 'ATTEMPT_DONE' with 'HOLD_FAILED_REGISTRATION'
- 21 May 19h30
Finally 19 more subscriptions appeared in the dq2.log.
- ccrc08_run2.016731.physics_D.merge.AOD.o0_r0_t0 (2 files) to BEIJING-LCG2_DATADISK and GRIF-SACLAY_DATADISK
- ccrc08_run2.016730.physics_B.merge.AOD.o0_r0_t0 (2 files) to BEIJING-LCG2_DATADISK and GRIF-SACLAY_DATADISK
- ccrc08_run2.016730.physics_D.merge.AOD.o0_r0_t0 (2 files) to BEIJING-LCG2_DATADISK and GRIF-SACLAY_DATADISK
and so on.
In the dq2.log, the files got FileTransferring, VALIDATED, FileCopied, but then, there are errors
FileTransferErrorMessage : reason = [FTS] FTS State [Failed] FTS Retries [1] Reason [TRANSFER error during TRANSFER phase: [TRANSFER_TIMEOUT] globus_gass_copy_register_url_to_url: Connection timed out] FileRegisterErrorMessage : reason = LFC exception [Cannot connect to LFC [lfc://lfc-prod.in2p3.fr:/grid/atlas]]
- 21 May 18h20
Around 17h30 transfers resumed. Rate for 17h40-18h20: IN2P3-CC_DATADISK: 158 MB/s, IN2P3-CC_DATATAPE: 37 MB/s
no subscriptions/transfers T1-T2 since the very first dataset ccrc08_run2.016731.physics_A.merge.AOD.o0_r0_t0 (vuid = 6403bd5a-5a71-4732-9a0a-b22b56aef106) to GRIF-LAL_DATADISK and TOKYO-LCG2_DATADISK. Transfers to Tokyo has finished. LAL is still Inactive.
21 May 16h40
Since around 16:40 T0-T1 transfers stopped. There are many errors.
SOURCE error during PREPARATION phase: [GENERAL_FAILURE] Error caught in srm::getSrmUser.Error creating statement, Oracle code: 12537ORA-12537: TNS:connection closed] Source Host [srm-atlas.cern.ch]
21 May 13h30
T0-T1 Transfers started at around 13h30.
The overall throughput from T0 is over 1000MB/s.
Lyon is receiving its share at 100-200 MB/s (verying with time).
the average rate is about 40MB/s to IN2P3-CC_DATADISK
and 140MB/s to IN2P3-CC_DATATAPE according to dashb http://dashb-atlas-data-tier0.cern.ch/dashboard/request.py/site?name=LYON.
T1-T2 Transfers started at around 15h10. http://lcg2.in2p3.fr/wiki/images/20080521-1530-LYONT2.throughput.14400.png
- from dq2.log:
- 2008-05-21 15:00: SubscriptionQueued for dataset ccrc08_run2.016731.physics_A.merge.AOD.o0_r0_t0 (vuid = 6403bd5a-5a71-4732-9a0a-b22b56aef106) to GRIF-LAL_DATADISK and TOKYO-LCG2_DATADISK
- 2008-05-21 15:00: FileTransferring: 3 files of the dataset ccrc08_run2.016731.physics_A.merge.AOD.o0_r0_t0 (fsize = 3600000000 each) for both TOKYO-LCG2_DATADISK and GRIF-LAL_DATADISK
- 2008-05-21 15:03: VALIDATED: 3 files of dataset ccrc08_run2.016731.physics_A.merge.AOD.o0_r0_t0 at srm://lcg-se01.icepp.jp
- 2008-05-21 15:03: FileCopied: 3 files of dataset ccrc08_run2.016731.physics_A.merge.AOD.o0_r0_t0 at TOKYO-LCG2_DATADISK
- 2008-05-21 15:03: FileDone: 3 files of dataset ccrc08_run2.016731.physics_A.merge.AOD.o0_r0_t0 at TOKYO-LCG2_DATADISK
- 2008-05-21 15:05: SubscriptionComplete: vuid = 6403bd5a-5a71-4732-9a0a-b22b56aef106 : site = TOKYO-LCG2_DATADISK : dsn = ccrc08_run2.016731.physics_A.merge.AOD.o0_r0_t0 : version = 1
- FTS channels for LAL are 'Inactive' for 'Pb clim LAL'