Difference between revisions of "Atlas:FDR2"
(→12nd June) |
(→14th June) |
||
Ligne 454: | Ligne 454: | ||
=== 14th June === | === 14th June === | ||
* The re-produced FDR2 datasets have been re-exported to T1s. | * The re-produced FDR2 datasets have been re-exported to T1s. | ||
+ | * Now 3 RAW datasets have been exported to IN2P3-CC_DATATAPE. | ||
+ | |||
+ | === 15th June === | ||
* Now 5 RAW datasets have been exported to IN2P3-CC_DATATAPE. | * Now 5 RAW datasets have been exported to IN2P3-CC_DATATAPE. | ||
+ | * There's something wrong with the configurations resulted in low transfer rate. | ||
+ | |||
+ | === 16th June === | ||
+ | * Configuration problems found in dCahce at LYON resulted in failure of transfers to T2s. |
Version du 12:56, 16 juin 2008
Sommaire
Monitoring pages/graphs
FTS
http://cctoolsafs.in2p3.fr/fts/monitoring/prod/ftsmonitor.php?vo=atlas
T0-T1 (ALL)
- FDR monitoring page - T1s
http://atldq2pro.cern.ch:8000/ft/mon/fdrmon_tier1s.html
http://dashb-atlas-data-tier0.cern.ch/dashboard/request.py/site
- Throughput
- Errors
- http://dashb-atlas-data-tier0.cern.ch/dashboard/templates/plots/OVERVIEW.num_file_xs_error.14400.png
T0-T1 (Lyon)
http://dashb-atlas-data-tier0.cern.ch/dashboard/request.py/site?statsInterval=4&name=LYON
- Throughput T0-LYON
- Errors T0-LYON
- Throughput Others-LYON
- Errors Others-LYON
:http://dashb-atlas-data.cern.ch/dashboard/templates/plots/LYON.T1.num_file_xs_error.14400.png
T1-T1
T1-T2 (Lyon)
- FDR monitoring page
http://atldq2pro.cern.ch:8000/ft/mon/fdrmon_TiersInfo.html
http://atldq2pro.cern.ch:8000/ft/mon/fdrmon_T1-T2_matrix_day.html
- Dashboard
http://dashb-atlas-data.cern.ch/dashboard/request.py/site?statsInterval=4&name=LYON
- Throughput
- Errors
Network Graphs
- IN2P3 Weathermap
- cc-in2p3
- lyo-cern (lhcopn-in2p3.cern.ch)
http://netstat.in2p3.fr/weathermap/graphiques/lyo-cern-daily.gif - lyon-nord
- lyo-nrd
http://netstat.in2p3.fr/weathermap/graphiques/lyo-nrd-daily.gif - orsay
- orsay (in2p3-orsay.cssi.renater.fr
http://netstat.in2p3.fr/weathermap/graphiques/orsay-daily.gif - lal-cc
- Liaison LAL - CC
http://netstat.in2p3.fr/weathermap/graphiques/lal-cc-daily.gif - lal
- lal
http://netstat.in2p3.fr/weathermap/graphiques/lal-daily.gif - lpnhe
- lpnhe-nrd (in2p3-jussieu.cssi.renater.fr)
http://netstat.in2p3.fr/weathermap/graphiques/lpnhe-nrd-daily.gif
lpnhe (Paris-LPNHE.in2p3.fr)
http://netstat.in2p3.fr/weathermap/graphiques/lpnhe-daily.gif - lapp
- ann-nrd
http://netstat.in2p3.fr/weathermap/graphiques/ann-nrd-daily.gif - lpc
- lpc-cf
http://netstat.in2p3.fr/weathermap/graphiques/lpc-cf-daily.gif
http://netstat.in2p3.fr/weathermap/graphiques/cppm-daily.gif
- international (tokyo etc.)
- parisnrd
http://netstat.in2p3.fr/weathermap/graphiques/parisnrd-daily.gif
- GEANT - NYC
- MANLAN
http://dc-snmp.wcc.grnoc.iu.edu/manlan/img/sw.newy32aoa.manlan.internet2.edu--te10_1-std5.gif
- NYC - TOKYO
- MANLAN
http://dc-snmp.wcc.grnoc.iu.edu/manlan/img/sw.newy32aoa.manlan.internet2.edu--te11_1-std5.gif
T0-T1
T1-T2
General informations
- FDR page
https://twiki.cern.ch/twiki/bin/view/Atlas/FullDressRehearsal
- Replication status(checked ~ every 2h?)
http://pandamon.usatlas.bnl.gov:25880/server/pandamon/query?mode=listFDR
- Contacts
- IN2P3-CPPM_DATADISK :
- IN2P3-LPSC_DATADISK :
- IN2P3-LPC_DATADISK :
- GRIF-LAL_DATADISK : (grid.admin a lal.in2p3.fr)
- GRIF-LPNHE_DATADISK :
- GRIF-SACLAY_DATADISK :
- BEIJING-LCG2_DATADISK : (yanxf a ihep.ac.cn, Erming.Pei a cern.ch)
- RO-07-NIPNE_DATADISK : (ciubancan a nipne.ro)
- RO-02-NIPNE_DATADISK : (tpreda a nipne.ro)
- TOKYO-LCG2_DATADISK : (lcg-admin a icepp.s.u-tokyo.ac.jp)
Only srmv2+ space tokens sites will collect data. The list of streams is : Egamma, Bphysics, MinBias, Muon and Jet. If no precise stream name is associated to the share, the site will get any stream. The site in parenthesis will collect the complementary datasets. The T2 ressources at T1 means that users without production role can access CPUs at the T1 without the Production role.
* US : o AGLT2_DATADISK : 100 % o MWT2_DATADISK: 100 % o NET2_DATADISK: 100 % o SLACXRD_DATADISK: 100 % o SWT2_CPB_DATADISK: 100 % o T2 CPU ressources in BNL to access AOD * CA : o SFU-LCG2_DATADISK : 40% o VICTORIA-LCG2_DATADISK : 30% o ALBERTA-LCG2_DATADISK : 30% o TORONTO-LCG2_DATADISK : 50 % * IT : o INFN-MILANO_DATADISK : 25% o INFN-FRASCATI_DATADISK : 25% o INFN-ROMA1_DATADISK : 100% o INFN-NAPOLI-ATLAS_DATADISK : 100 % * TW : o TW-FTT_DATADISK : 50 % * NL o JINR-LCG2_DATADISK : 50% Jet (RU-PROTVINO-IHEP_DATADISK) o JINR-LCG2_DATADISK : 50% MinBias (RU-PROTVINO-IHEP_DATADISK) o JINR-LCG2_DATADISK : 50% Muon (RU-PROTVINO-IHEP_DATADISK) o JINR-LCG2_DATADISK : 50% Bphysics (RU-PNPI_DATADISK) o RU-PROTVINO-IHEP_DATADISK : 50% Jet (JINR-LCG2_DATADISK) o RU-PROTVINO-IHEP_DATADISK : 50% Muon (JINR-LCG2_DATADISK) o RU-PROTVINO-IHEP_DATADISK : 50% MinBias (JINR-LCG2_DATADISK) o RU-PROTVINO-IHEP_DATADISK : 50% Egamma (RU-PNPI_DATADISK) o RU-PNPI_DATADISK : 50% Egamma (RU-PROTVINO-IHEP_DATADISK) o RU-PNPI_DATADISK : 50% Bphysics (JINR-LCG2_DATADISK) o RU-PNPI_DATADISK : 50% Jet (RRC-KI_DATADISK) o RU-PNPI_DATADISK : 50% MinBias (RRC-KI_DATADISK) o RRC-KI_DATADISK : 50% Jet (RU-PNPI_DATADISK) o RRC-KI_DATADISK : 50% MinBias (RU-PNPI_DATADISK) o RRC-KI_DATADISK : 50% Egamma o RRC-KI_DATADISK : 50% Bphysics o CSTCDIE_DATADISK : 20% * FR : o IN2P3-LAPP_DATADISK : 100 % Egamma o IN2P3-CPPM_DATADISK : 100 % Egamma o IN2P3-LPSC_DATADISK : 5 % o IN2P3-LPC_DATADISK : 100% Muon, 50% Egamma o BEIJING-LCG2_DATADISK : 100 % Egamma, 50% Muon, 35% MinBias o RO-07-NIPNE_DATADISK : 10% o RO-02-NIPNE_DATADISK : 10% o GRIF-LAL_DATADISK : 45% o GRIF-LPNHE_DATADISK : 25% o GRIF-SACLAY_DATADISK : 30 % o TOKYO-LCG2_DATADISK : 100% o T2 CPU ressources in LYON to access AOD * DE : o DESY-HH_DATADISK : 100% o CSCS-LCG2_DATADISK: 100% o DESY-ZN_DATADISK : 50% o GOEGRID_DATADISK : 50% o WUPPERTALPROD_DATADISK: 25% o CYFRONET-LCG2_DATADISK : 15% o PRAGUELCG2_DATADISK : 25% o LRZ-LMU_DATADISK : 35% o UNI-FREIBURG_DATADISK : 100% Jet, 100% Muon * UK : o UKI-SOUTHGRID-CAM-HEP_DATADISK : 50 % Muon (UKI-NORTHGRID-LANCS-HEP_DATADISK o UKI-NORTHGRID-LANCS-HEP_DATADISK : 50 % Muon (UKI-SOUTHGRID-CAM-HEP_DATADISK) o UKI-NORTHGRID-SHEF-HEP_DATADISK : 50 % MinBias (UKI-SCOTGRID-GLASGOW_DATADISK) o UKI-SCOTGRID-GLASGOW_DATADISK : 50 % MinBias (UKI-NORTHGRID-SHEF-HEP_DATADISK) o UKI-LT2-RHUL_DATADISK : 50 % Egamma (UKI-NORTHGRID-LIV-HEP_DATADISK) o UKI-NORTHGRID-LIV-HEP_DATADISK : 50 % Egamma (UKI-LT2-RHUL_DATADISK) o UKI-SOUTHGRID-RALPP_DATADISK : 50 Egamma (UKI-LT2-RHUL_DATADISK) o UKI-SOUTHGRID-BHAM-HEP_DATADISK : 50 Jet (UKI-SOUTHGRID-OX-HEP_DATADISK) * ES : o IFIC-LCG2_DATADISK: 50% o UAM-LCG2_DATADISK: 25% o IFAE_DATADISK: 25% * NDGF : o SIGNET_DATADISK : 50 %
Logbook
2nd June
- Since Lyon is in donwtime, so there's no transfers from LYON to T2s today.
http://cc.in2p3.fr/cc_accueil.php3?lang=fr
- Free Space
Token Total Guaranteed Unused ------------------------------------------ BEIJING-LCG2_DATADISK 6.366 6.366 6.343 GRIF-LAL_DATADISK 6.0 6.0 5.732 GRIF-LPNHE_DATADISK 3.906 3.906 3.89 GRIF-SACLAY_DATADISK 6.0 6.0 5.994 IN2P3-CPPM_DATADISK 1.0 1.0 0.977 IN2P3-LAPP_DATADISK 4.0 4.0 4.0 IN2P3-LPC_DATADISK 3.0 3.0 3.0 IN2P3-LPSC_DATADISK 0.488 0.488 0.482 RO-02-NIPNE_DATADISK 3.0 3.0 3.0 RO-07-NIPNE_DATADISK 2.0 2.0 2.0 TOKYO-LCG2_DATADISK 15.0 15.0 13.643
3rd June
- Lyon is still in donwtime, no transfers from LYON to T2s in this morning.
4th June
- 22h02
- Still no transfers, desipte that all the services in LYON shows OK.
- Throughput(snapshot)
- Services in LYON
- Free space
BEIJING-LCG2_DATADISK 6.366 6.366 6.343 BEIJING-LCG2_MCDISK 5.457 5.457 5.457 GRIF-LAL_DATADISK 6.0 6.0 5.732 GRIF-LAL_MCDISK 7.0 7.0 3.724 GRIF-LPNHE_DATADISK 3.906 3.906 3.89 GRIF-LPNHE_MCDISK 1.0 1.0 0.194 GRIF-SACLAY_DATADISK 6.0 6.0 5.994 GRIF-SACLAY_MCDISK 2.0 2.0 1.713 IN2P3-CPPM_DATADISK 1.0 1.0 0.977 IN2P3-CPPM_MCDISK 0.781 0.781 0.765 IN2P3-LAPP_DATADISK 4.0 4.0 4.0 IN2P3-LAPP_MCDISK 2.0 2.0 0.745 IN2P3-LPC_DATADISK 3.0 3.0 3.0 IN2P3-LPC_MCDISK 2.0 2.0 1.936 IN2P3-LPSC_DATADISK 0.488 0.488 0.482 IN2P3-LPSC_MCDISK 2.0 2.0 1.518 RO-02-NIPNE_DATADISK 3.0 3.0 3.0 RO-02-NIPNE_MCDISK 1.0 1.0 0.995 RO-07-NIPNE_DATADISK 2.0 2.0 2.0 RO-07-NIPNE_MCDISK 1.0 1.0 0.654 TOKYO-LCG2_DATADISK 15.0 15.0 13.643 TOKYO-LCG2_MCDISK 10.0 10.0 10.0
5th June
- Data produced on Tuesday are supposed to be shipped in this night.
- 14h50
- Now the RAW data has begun to shipped to LYON.
- Throughput(snapshot)(MB/s)
- Number of transfer errors (snapshot)
- 21h30
- IN2P3-CC_MCDISK has some problem of staging file. But it should have nothing to do with T0 export.
- 47 datasets have been exported to T1s. The total is 90.
- Throughput(snapshot)(MB/s)
- Number of transfer errors (snapshot) -- It's the problem of staging file in T1 and nothing to do with FDR transfers.
6th June
- 09h50
- Throughput(snapshot) (MB/s)
- Number of transfer errors (snapshot)
- 10h03
- Now all the RAW datasets have been exported to T1. The total number is 90.
http://atlas.web.cern.ch/Atlas/tzero/prod2/monitoring/tables/ddm_RAW_status.html
7th June
- at CERN, will re-start the bulk processing with using the upcoming AtlasPoint1-14.1.0.12 release cache.
- 11h30
- SRM in LYON is still in bad shape. Experts are working on it.
8th June
- 14h20
- FDR2 AODs/DPDs/TAGs/ESDs are being exported to LYON (IN2P3-CC_DATADISK), but haven't been shipped to T2s.
- Number of datasets transfered by far
$ dq2-list-dataset-site IN2P3-CC_DATADISK|grep fdr08_run2.*|wc -l 68
- Throughput(T0-T1,snapshot)(MB/s)
- Number of transfer errors
- Throughput(T1-T2,snapshot)(MB/s)
9th June
- 0h20
- FDR datasets has begun to be shipped to T2s.
- Some T2s (e.g. BEIJING) subscribed dedicated datasets, which will be done later in today's morning.
- Shipped datasets
IN2P3-CC_DATADISK 315 BEIJING-LCG2_DATADISK 0 GRIF-LAL_DATADISK 79 GRIF-LPNHE_DATADISK 39 GRIF-SACLAY_DATADISK 38 IN2P3-CPPM_DATADISK 0 IN2P3-LAPP_DATADISK 12 IN2P3-LPC_DATADISK 12 IN2P3-LPSC_DATADISK 32 RO-02-NIPNE_DATADISK 29 RO-07-NIPNE_DATADISK 46 TOKYO-LCG2_DATADISK 134
- Free space
BEIJING-LCG2_DATADISK 6.366 6.366 6.343 GRIF-LAL_DATADISK 6.0 6.0 5.579 GRIF-LPNHE_DATADISK 3.906 3.906 3.817 GRIF-SACLAY_DATADISK 6.0 6.0 5.927 IN2P3-CPPM_DATADISK 1.0 1.0 0.977 IN2P3-LAPP_DATADISK 4.0 4.0 4.0 IN2P3-LPC_DATADISK 0 0 0 IN2P3-LPSC_DATADISK 0.488 0.488 0.439 RO-02-NIPNE_DATADISK 3.0 3.0 2.986 RO-07-NIPNE_DATADISK 2.0 2.0 1.896 TOKYO-LCG2_DATADISK 15.0 15.0 13.313
- Troughputs(snapshot)(MB/s)
- Number of transfer erros(snapshot)
- Note: No errors found in FDR2 transfers. Errors showed in this snapshot are since other production data transfers.
- 09h24
- Troughputs(snapshot)(MB/s)
- Number of transfer erros(snapshot)
- Note: No errors found in FDR2 transfers. Errors showed in this snapshot are since other production data transfers.
- Shipped datasets
IN2P3-CC_DATADISK 319 BEIJING-LCG2_DATADISK 0 GRIF-LAL_DATADISK 79 GRIF-LPNHE_DATADISK 39 GRIF-SACLAY_DATADISK 38 IN2P3-CPPM_DATADISK 0 IN2P3-LAPP_DATADISK 12 IN2P3-LPC_DATADISK 12 IN2P3-LPSC_DATADISK 32 RO-02-NIPNE_DATADISK 59 RO-07-NIPNE_DATADISK 49 TOKYO-LCG2_DATADISK 134
- Free space
BEIJING-LCG2_DATADISK 6.366 6.366 6.343 GRIF-LAL_DATADISK 6.0 6.0 5.579 GRIF-LPNHE_DATADISK 3.906 3.906 3.817 GRIF-SACLAY_DATADISK 6.0 6.0 5.927 IN2P3-CPPM_DATADISK 1.0 1.0 0.977 IN2P3-LAPP_DATADISK 4.0 4.0 4.0 IN2P3-LPC_DATADISK 3.0 3.0 3.0 IN2P3-LPSC_DATADISK 0.488 0.488 0.439 RO-02-NIPNE_DATADISK 3.0 3.0 2.945 RO-07-NIPNE_DATADISK 2.0 2.0 1.885 TOKYO-LCG2_DATADISK 15.0 15.0 13.313
- 14h00
Many failed with RO-02-NIPNE_DATADISK.
[FTS] FTS State [Failed] FTS Retries [1] Reason [TRANSFER error during TRANSFER phase: [PERMISSION] the server sent an error response: 550 550 rfio write failure: Permission denied.] Source Host [ccsrm.in2p3.fr] 16 [FTS] FTS State [Failed] FTS Retries [1] Reason [TRANSFER error during TRANSFER phase: [TRANSFER_TIMEOUT] gridftp_copy_wait: Connection timed out] Source Host [ccsrm.in2p3.fr] 9 [FTS] FTS State [Failed] FTS Retries [1] Reason [TRANSFER error during TRANSFER phase: [GRIDFTP] the server sent an error response: 426 426 Transfer aborted (Unexpected Exception : java.io.IOException: Broken pipe)] Source Host [ccsrm.in2p3.fr] 2 [FTS] FTS State [Failed] FTS Retries [1] Reason [The process serving the transfer (status = PREPARING) is no longer active (could not open file /proc/3775/cmdline)] Source Host [ccsrm.in2p3.fr] 1 [FTS] FTS State [Failed] FTS Retries [1] Reason [The process serving the transfer (status = PREPARING) is no longer active (could not open file /proc/30076/cmdline)] Source Host [ccsrm.in2p3.fr] 1
- 20h10
- FDR datasets have finished transferred to all T2s except BEIJING-LCG2_DATADISK and RO-02-NIPNE_DATADISK.
- Many "Connection timed out" errors found at BEIJING-LCG2_DATADISK and RO-02-NIPNE_DATADISK.
[FTS] FTS State [Failed] FTS Retries [1] Reason [SOURCE error during PREPARATION phase: [CONNECTION] failed to contact on remote SRM [httpg://ccsrm.in2p3.fr:8443/srm/managerv2]. Givin' up after 3 tries] Source Host [ccsrm.in2p3.fr] 111 [FTS] FTS State [Failed] FTS Retries [1] Reason [TRANSFER error during TRANSFER phase: [TRANSFER_TIMEOUT] gridftp_copy_wait: Connection timed out] Source Host [ccsrm.in2p3.fr] 95 [FTS] FTS State [Failed] FTS Retries [1] Reason [SOURCE error during PREPARATION phase: [REQUEST_TIMEOUT] failed to prepare source file in 180 seconds] Source Host [ccsrm.in2p3.fr] 3
- Shipped datasets
IN2P3-CC_DATADISK 134 BEIJING-LCG2_DATADISK 79 GRIF-LAL_DATADISK 67 GRIF-LPNHE_DATADISK 32 GRIF-SACLAY_DATADISK 26 IN2P3-CPPM_DATADISK 52 IN2P3-LAPP_DATADISK 52 IN2P3-LPC_DATADISK 52 IN2P3-LPSC_DATADISK 32 RO-02-NIPNE_DATADISK 49 RO-07-NIPNE_DATADISK 49 TOKYO-LCG2_DATADISK 126
10th June
- Summary
All T2s except BEIJING-LCG2_DATADISK and RO-02-NIPNE_DATADISK were in good status and received all the exported datasets when FDR transferring. The problem of "Connection timed out" with transfers to BEIJING will be hopefully solved by tuned the transfer duration >1800s by Lionel. RO-02-NIPNE is said being upgraded from SL3 to SL4.
11st June
- Anouncement
All FDR2 datasets will be deleted from all sites since part of the detector was not included in the reconstruction. It was not detected by the Data Quality because it is run on the express stream which is correct.
Old datasets will be deleted centrally by Stephane and new datasets that would be named begin with "ccrc08-run.0024*" will be re-generated and re-exported soon in today.
T2s will receive the same share rate as requested for FDR (except that DPD are not produced).
12nd June
- 01h20
- Function Test datasets(named beginning with "ccrc08_run2.*") to T2s.
For this moment, all T2 sites seems in good shape to accept datasets except RO-07-NIPNE and RO-02-NIPNE that are in downtime.
- Datasets shipped (begins with "ccrc08_run2.*")
IN2P3-CC_DATADISK 387 BEIJING-LCG2_DATADISK 46 GRIF-LAL_DATADISK 141 GRIF-LPNHE_DATADISK 79 GRIF-SACLAY_DATADISK 90 IN2P3-CPPM_DATADISK 22 IN2P3-LAPP_DATADISK 22 IN2P3-LPC_DATADISK 36 IN2P3-LPSC_DATADISK 17 RO-02-NIPNE_DATADISK 20 RO-07-NIPNE_DATADISK 33 TOKYO-LCG2_DATADISK 310
- Troughput
- Transfer errors
- Note that the errors showed in this snapshot involves all transfers mixed with other transfers(e.g MC/Panda). So it may not reflect the real situation FT transfers.
13rd June
14th June
- The re-produced FDR2 datasets have been re-exported to T1s.
- Now 3 RAW datasets have been exported to IN2P3-CC_DATATAPE.
15th June
- Now 5 RAW datasets have been exported to IN2P3-CC_DATATAPE.
- There's something wrong with the configurations resulted in low transfer rate.
16th June
- Configuration problems found in dCahce at LYON resulted in failure of transfers to T2s.