Atlas:CCRC08May

Un article de lcgwiki.
Revision as of 23:18, 21 mai 2008 by Ueda (talk | contribs) (Logbook)
Jump to: navigation, search

T0-T1 transfer tests (week1)

T1-T1 transfer tests (week2)

T0-T1-T2 transfer tests (week3)

General remarks

https://twiki.cern.ch/twiki/bin/view/Atlas/DDMOperationsGroup#CCRC08_2_May_2008

The T0 load generator will run at "peak" rate for 3 days ("peak" rate means data from 24h/day of detector data taking at 200Hz are distributed in 24h, while "nominal" rate means data from 14 hours/day of detector data taking at 200Hz are distributed in 24h).

At peak rate 17,280,000 events/day are produced, corresponding to 27.6 TB/day of RAW, 17.3 TB/day of ESD and 3.5 TB/day of AOD (considering the sizes of 1.6 MB/event for RAW, 1.0 MB/event for ESD and 0.2 MB/event for AOD)

T0-T1(LYON)

Shipping continuously data to T1s according to computing model, sites should demonstrate to sustain for 3 days the following export rates

SITE TAPE DISK TOTAL
IN2P3 48.00 MB/s 100.00 MB/s 148.00 MB/s

Metric for success: Sites should be capable of sustaining 90% of the mentioned rates (for both disk and tape) for at least 2 days of test. For sites who would like to test higher throughput, we can oversubscribe (both to disk and tape).

As a reminder, here the table of the necessary space needed at each T1

SITE TAPE DISK
IN2P3 12.4416 TB 25.92 TB

T1-T2

T2s will receive AODs, which should be generated at a rate of 3.5TB/day. The amount that each site receives depends on the share

  • IN2P3-LAPP_DATADISK : 12%
  • IN2P3-CPPM_DATADISK : 5%
  • IN2P3-LPSC_DATADISK : 5 %
  • IN2P3-LPC_DATADISK : 13%
  • GRIF-LAL_DATADISK : 30%
  • GRIF-LPNHE_DATADISK : 15%
  • GRIF-SACLAY_DATADISK : 20 %
  • BEIJING-LCG2_DATADISK : 20 %
  • RO-07-NIPNE_DATADISK : 10%
  • RO-02-NIPNE_DATADISK : 10%
  • TOKYO-LCG2_DATADISK : 50%

as written in https://twiki.cern.ch/twiki/bin/view/Atlas/DDMOperationsGroup#CCRC08_2_May_2008

The shares are decided rather arbitrary according to the free space in ATLASDATADISK. These numers can be raised at a later stage of the test, but at first we would like to be sure everythinig goes well with this rate.

Current Status

T0-T1 (ALL) http://dashb-atlas-data-tier0.cern.ch/dashboard/request.py/site

  • Throughput
http://dashb-atlas-data-tier0.cern.ch/dashboard/templates/plots/OVERVIEW.throughput.14400.png
  • Errors
http://dashb-atlas-data-tier0.cern.ch/dashboard/templates/plots/OVERVIEW.num_file_xs_error.14400.png


T0-T1 (Lyon) http://dashb-atlas-data-tier0.cern.ch/dashboard/request.py/site?statsInterval=4&name=LYON

  • Throughput
http://dashb-atlas-data-tier0.cern.ch/dashboard/templates/plots/LYON.throughput.14400.png
  • Errors
http://dashb-atlas-data-tier0.cern.ch/dashboard/templates/plots/LYON.num_file_xs_error.14400.png


T1-T2 (Lyon) http://dashb-atlas-data.cern.ch/dashboard/request.py/site?statsInterval=4&name=LYON

  • Throughput
http://dashb-atlas-data.cern.ch/dashboard/templates/plots/LYON.T2.throughput.14400.png
  • Errors
http://dashb-atlas-data.cern.ch/dashboard/templates/plots/LYON.T2.num_file_xs_error.14400.png

Logbook

21 May 22h30

Stephane found a temporary solution for the LFC problem. It does not accept Mario's certificate, but does Kors' (thus no problem in T0-T1 transfers).

the dashb graphs show these transfers as of 19:xx (transfers done at 19:xx = 17:xx UTC, and registration done at around 22:30)


21 May 20h20

dashb shows transfers, which seem to be successful but with many registration errors, in the table (not in the graph). Looking into the details, file states are 'ATTEMPT_DONE' with 'HOLD_FAILED_REGISTRATION'

LYON.T2.throughput.14400.20080521-2028.png

the time on ftsmonitor is UTC.


21 May 19h30

Finally 19 more subscriptions appeared in the dq2.log.

  • ccrc08_run2.016731.physics_D.merge.AOD.o0_r0_t0 (2 files) to BEIJING-LCG2_DATADISK and GRIF-SACLAY_DATADISK
  • ccrc08_run2.016730.physics_B.merge.AOD.o0_r0_t0 (2 files) to BEIJING-LCG2_DATADISK and GRIF-SACLAY_DATADISK
  • ccrc08_run2.016730.physics_D.merge.AOD.o0_r0_t0 (2 files) to BEIJING-LCG2_DATADISK and GRIF-SACLAY_DATADISK

and so on.

In the dq2.log, the files got FileTransferring, VALIDATED, FileCopied, but then, there are errors

FileTransferErrorMessage : reason = [FTS] FTS State [Failed] FTS Retries [1] Reason [TRANSFER error during TRANSFER phase: [TRANSFER_TIMEOUT] globus_gass_copy_register_url_to_url: Connection timed out]
FileRegisterErrorMessage : reason = LFC exception [Cannot connect to LFC [lfc://lfc-prod.in2p3.fr:/grid/atlas]]
21 May 18h20 

Around 17h30 transfers resumed. Rate for 17h40-18h20: IN2P3-CC_DATADISK: 158 MB/s, IN2P3-CC_DATATAPE: 37 MB/s

no subscriptions/transfers T1-T2 since the very first dataset ccrc08_run2.016731.physics_A.merge.AOD.o0_r0_t0 (vuid = 6403bd5a-5a71-4732-9a0a-b22b56aef106) to GRIF-LAL_DATADISK and TOKYO-LCG2_DATADISK. Transfers to Tokyo has finished. LAL is still Inactive.

21 May 16h40
Since around 16:40 T0-T1 transfers stopped. There are many errors.

SOURCE error during PREPARATION phase: [GENERAL_FAILURE] 
Error caught in srm::getSrmUser.Error creating statement, 
Oracle code: 12537ORA-12537: TNS:connection closed] 
Source Host [srm-atlas.cern.ch]
OVERVIEW.throughput.14400.20080521-1300-1700.png
OVERVIEW.num file xs error.14400.20080521-1300-1700.png
LYON.throughput.14400.20080521-1300-1700.png
LYON.num file xs error.14400.20080521-1300-1700.png



21 May 13h30
T0-T1 Transfers started at around 13h30. The overall throughput from T0 is over 1000MB/s. Lyon is receiving its share at 100-200 MB/s (verying with time). the average rate is about 40MB/s to IN2P3-CC_DATADISK and 140MB/s to IN2P3-CC_DATATAPE according to dashb http://dashb-atlas-data-tier0.cern.ch/dashboard/request.py/site?name=LYON.

20080521-1530-OVERVIEW.throughput.14400.png
20080521-1530-LYON.throughput.14400.png

T1-T2 Transfers started at around 15h10. http://lcg2.in2p3.fr/wiki/images/20080521-1530-LYONT2.throughput.14400.png

  • from dq2.log:
    • 2008-05-21 15:00: SubscriptionQueued for dataset ccrc08_run2.016731.physics_A.merge.AOD.o0_r0_t0 (vuid = 6403bd5a-5a71-4732-9a0a-b22b56aef106) to GRIF-LAL_DATADISK and TOKYO-LCG2_DATADISK
    • 2008-05-21 15:00: FileTransferring: 3 files of the dataset ccrc08_run2.016731.physics_A.merge.AOD.o0_r0_t0 (fsize = 3600000000 each) for both TOKYO-LCG2_DATADISK and GRIF-LAL_DATADISK
    • 2008-05-21 15:03: VALIDATED: 3 files of dataset ccrc08_run2.016731.physics_A.merge.AOD.o0_r0_t0 at srm://lcg-se01.icepp.jp
    • 2008-05-21 15:03: FileCopied: 3 files of dataset ccrc08_run2.016731.physics_A.merge.AOD.o0_r0_t0 at TOKYO-LCG2_DATADISK
    • 2008-05-21 15:03: FileDone: 3 files of dataset ccrc08_run2.016731.physics_A.merge.AOD.o0_r0_t0 at TOKYO-LCG2_DATADISK
    • 2008-05-21 15:05: SubscriptionComplete: vuid = 6403bd5a-5a71-4732-9a0a-b22b56aef106 : site = TOKYO-LCG2_DATADISK : dsn = ccrc08_run2.016731.physics_A.merge.AOD.o0_r0_t0 : version = 1
  • FTS channels for LAL are 'Inactive' for 'Pb clim LAL'

T0-T1-T2 + T1-T1 transfer tests (week4)