Un article de lcgwiki.
Revision as of 08:40, 23 mai 2008 by Ueda (talk | contribs) (Logbook)
Jump to: navigation, search

T0-T1 transfer tests (week1)

T1-T1 transfer tests (week2)

Some summaries presented at ADC Operations Meeting

T0-T1-T2 transfer tests (week3)

General remarks

The T0 load generator will run at "peak" rate for 3 days ("peak" rate means data from 24h/day of detector data taking at 200Hz are distributed in 24h, while "nominal" rate means data from 14 hours/day of detector data taking at 200Hz are distributed in 24h).

At peak rate 17,280,000 events/day are produced, corresponding to 27.6 TB/day of RAW, 17.3 TB/day of ESD and 3.5 TB/day of AOD (considering the sizes of 1.6 MB/event for RAW, 1.0 MB/event for ESD and 0.2 MB/event for AOD)

monitoring page

data replication from CERN to Tier-1s

data replication within clouds

Summaries / Reports


ADC Oper 22 May


Shipping continuously data to T1s according to computing model, sites should demonstrate to sustain for 3 days the following export rates

IN2P3 48.00 MB/s 100.00 MB/s 148.00 MB/s

Metric for success: Sites should be capable of sustaining 90% of the mentioned rates (for both disk and tape) for at least 2 days of test. For sites who would like to test higher throughput, we can oversubscribe (both to disk and tape).

As a reminder, here the table of the necessary space needed at each T1

IN2P3 12.4416 TB 25.92 TB


T2s will receive AODs, which should be generated at a rate of 3.5TB/day. The amount that each site receives depends on the share

  • IN2P3-LPC_DATADISK : 13%
  • GRIF-LAL_DATADISK : 30% (grid.admin a
  • BEIJING-LCG2_DATADISK : 20 % (yanxf a, Erming.Pei a
  • RO-07-NIPNE_DATADISK : 10% (ciubancan a
  • RO-02-NIPNE_DATADISK : 10% (tpreda a
  • TOKYO-LCG2_DATADISK : 50% (lcg-admin a

as written in

The shares are decided rather arbitrary according to the free space in ATLASDATADISK. These numers can be raised at a later stage of the test, but at first we would like to be sure everythinig goes well with this rate.

Current Status

T0-T1 (ALL)

  • Throughput
  • Errors

T0-T1 (Lyon)

  • Throughput
  • Errors

T1-T2 (Lyon)

  • Throughput
  • Errors


23 May

23 May 08h20

Beijing has started working since 01:44, although with a number of errors until 07:43.

 [FTS] FTS State [Failed] FTS Retries [1] Reason [TRANSFER error during TRANSFER phase: [TRANSFER_TIMEOUT] gridftp_copy_wait: Connection timed out] Source Host []

after the last error at 07:43, transfers seem going well.

RO-02-NIPNE_DATADISK also working from time to time. still with many errors

22 May

22 May 22h50

Transfers to RO-02-NIPNE_DATADISK have been failing. GGUS-Ticket 36728 has been created.

Otherwise, transfers are going well except for BEIJING.

22 May 14h50

One file assigned to CPPM has a source problem;

Received error message: SOURCE error during PREPARATION phase: [REQUEST_TIMEOUT] failed to prepare source file in 180 seconds

with srm://

GGUS-Ticket 36709 has been created.

22 May 14h20

the T0->LYON export was migrated from T0 VOBOX to LYON VOBOX. Transfers T0->LYON should be monitored with the Production dashboard

22 May 13h05

Titi: Unfortunately there was an unscheduled network breakdown in our institute started from about 6:30 to 10 GMT.

22 May 12h40

Stephane switched back the certificate from Kors' certificate to Mario's.

22 May 10h29

starting 09:21:35, there are errors in transfers to RO-07-NIPNE_DATADISK in dashb.

 [FTS] FTS State [Failed] FTS Retries [1] Reason [DESTINATION error during PREPARATION phase: [CONNECTION] failed to contact on remote SRM [httpg://]. Givin' up after 3 tries] Source Host []

GGUS-Ticket 36698 created

22 May 09h40

According to Alexei, cron job for subscription to T2s does not run frequently during the night. That is why.

22 May 09h00

transfers T1-T2 resumed since 8h30. reaching 900MB/s in total.

numbers of assigned datasets to sites look better now.

Killed a MC data subscription to RO-07-NIPNE_MCDISK.

22 May 08h00

T0-T1 transfers are proceeding, No T1-T2 transfers to DATADISK since last night.

according to dq2.log, new subscriptions today (since 5.22 00:00) are queued only to BEIJING and RO-02-NIPNE, resulting in errors.

the status table does not look nice, I will check.

apparently, no subscription to LAPP, CPPM, LPSC. TOKYO and LAL, who are assigned larger shares, have less subscriptions.

21 May

OVERVIEW.throughput.20080521.png LYON.throughput.20080521-1.png

21 May 23h30

There are many errors in transfers to NIPNE02

21 May 22h30

Stephane found a temporary solution for the LFC problem. It does not accept Mario's certificate, but does Kors' (thus no problem in T0-T1 transfers).

the dashb graphs show these transfers as of 19:xx (transfers done at 19:xx = 17:xx UTC, and registration done at around 22:30)

21 May 20h20

dashb shows transfers, which seem to be successful but with many registration errors, in the table (not in the graph). Looking into the details, file states are 'ATTEMPT_DONE' with 'HOLD_FAILED_REGISTRATION'


the time on ftsmonitor is UTC.

21 May 19h30

Finally 19 more subscriptions appeared in the dq2.log.

  • ccrc08_run2.016731.physics_D.merge.AOD.o0_r0_t0 (2 files) to BEIJING-LCG2_DATADISK and GRIF-SACLAY_DATADISK
  • ccrc08_run2.016730.physics_B.merge.AOD.o0_r0_t0 (2 files) to BEIJING-LCG2_DATADISK and GRIF-SACLAY_DATADISK
  • ccrc08_run2.016730.physics_D.merge.AOD.o0_r0_t0 (2 files) to BEIJING-LCG2_DATADISK and GRIF-SACLAY_DATADISK

and so on.

In the dq2.log, the files got FileTransferring, VALIDATED, FileCopied, but then, there are errors

FileTransferErrorMessage : reason = [FTS] FTS State [Failed] FTS Retries [1] Reason [TRANSFER error during TRANSFER phase: [TRANSFER_TIMEOUT] globus_gass_copy_register_url_to_url: Connection timed out]
FileRegisterErrorMessage : reason = LFC exception [Cannot connect to LFC [lfc://]]
21 May 18h20 

Around 17h30 transfers resumed. Rate for 17h40-18h20: IN2P3-CC_DATADISK: 158 MB/s, IN2P3-CC_DATATAPE: 37 MB/s

no subscriptions/transfers T1-T2 since the very first dataset ccrc08_run2.016731.physics_A.merge.AOD.o0_r0_t0 (vuid = 6403bd5a-5a71-4732-9a0a-b22b56aef106) to GRIF-LAL_DATADISK and TOKYO-LCG2_DATADISK. Transfers to Tokyo has finished. LAL is still Inactive.

21 May 16h40
Since around 16:40 T0-T1 transfers stopped. There are many errors.

Error caught in srm::getSrmUser.Error creating statement, 
Oracle code: 12537ORA-12537: TNS:connection closed] 
Source Host []
OVERVIEW.num file xs error.14400.20080521-1300-1700.png
LYON.num file xs error.14400.20080521-1300-1700.png

21 May 13h30
T0-T1 Transfers started at around 13h30. The overall throughput from T0 is over 1000MB/s. Lyon is receiving its share at 100-200 MB/s (verying with time). the average rate is about 40MB/s to IN2P3-CC_DATADISK and 140MB/s to IN2P3-CC_DATATAPE according to dashb


T1-T2 Transfers started at around 15h10.

  • from dq2.log:
    • 2008-05-21 15:00: SubscriptionQueued for dataset ccrc08_run2.016731.physics_A.merge.AOD.o0_r0_t0 (vuid = 6403bd5a-5a71-4732-9a0a-b22b56aef106) to GRIF-LAL_DATADISK and TOKYO-LCG2_DATADISK
    • 2008-05-21 15:00: FileTransferring: 3 files of the dataset ccrc08_run2.016731.physics_A.merge.AOD.o0_r0_t0 (fsize = 3600000000 each) for both TOKYO-LCG2_DATADISK and GRIF-LAL_DATADISK
    • 2008-05-21 15:03: VALIDATED: 3 files of dataset ccrc08_run2.016731.physics_A.merge.AOD.o0_r0_t0 at srm://
    • 2008-05-21 15:03: FileCopied: 3 files of dataset ccrc08_run2.016731.physics_A.merge.AOD.o0_r0_t0 at TOKYO-LCG2_DATADISK
    • 2008-05-21 15:03: FileDone: 3 files of dataset ccrc08_run2.016731.physics_A.merge.AOD.o0_r0_t0 at TOKYO-LCG2_DATADISK
    • 2008-05-21 15:05: SubscriptionComplete: vuid = 6403bd5a-5a71-4732-9a0a-b22b56aef106 : site = TOKYO-LCG2_DATADISK : dsn = ccrc08_run2.016731.physics_A.merge.AOD.o0_r0_t0 : version = 1
  • FTS channels for LAL are 'Inactive' for 'Pb clim LAL'

T0-T1-T2 + T1-T1 transfer tests (week4)


ADC Oper 22 May (slide 8 - 10)