Difference between revisions of "Atlas:FDR2"
(→11st June) |
(→10th June) |
||
Ligne 410: | Ligne 410: | ||
;Summary | ;Summary | ||
− | All T2s except BEIJING-LCG2_DATADISK and RO-02-NIPNE_DATADISK were in good status when | + | All T2s except BEIJING-LCG2_DATADISK and RO-02-NIPNE_DATADISK were in good status when FDR transferring. |
The problem of "Connection timed out" with transfers to BEIJING will be hopefully solved by tuned the transfer duration >1800s by Lionel. | The problem of "Connection timed out" with transfers to BEIJING will be hopefully solved by tuned the transfer duration >1800s by Lionel. | ||
RO-02-NIPNE is said being upgraded from SL3 to SL4. | RO-02-NIPNE is said being upgraded from SL3 to SL4. |
Version du 10:39, 11 juin 2008
Sommaire
Monitoring pages/graphs
FTS
http://cctoolsafs.in2p3.fr/fts/monitoring/prod/ftsmonitor.php?vo=atlas
T0-T1 (ALL)
- FDR monitoring page - T1s
http://atldq2pro.cern.ch:8000/ft/mon/fdrmon_tier1s.html
http://dashb-atlas-data-tier0.cern.ch/dashboard/request.py/site
- Throughput
- Errors
- http://dashb-atlas-data-tier0.cern.ch/dashboard/templates/plots/OVERVIEW.num_file_xs_error.14400.png
T0-T1 (Lyon)
http://dashb-atlas-data-tier0.cern.ch/dashboard/request.py/site?statsInterval=4&name=LYON
- Throughput T0-LYON
- Errors T0-LYON
- Throughput Others-LYON
- Errors Others-LYON
:http://dashb-atlas-data.cern.ch/dashboard/templates/plots/LYON.T1.num_file_xs_error.14400.png
T1-T1
T1-T2 (Lyon)
- FDR monitoring page
http://atldq2pro.cern.ch:8000/ft/mon/fdrmon_TiersInfo.html
http://atldq2pro.cern.ch:8000/ft/mon/fdrmon_T1-T2_matrix_day.html
- Dashboard
http://dashb-atlas-data.cern.ch/dashboard/request.py/site?statsInterval=4&name=LYON
- Throughput
- Errors
Network Graphs
- IN2P3 Weathermap
- cc-in2p3
- lyo-cern (lhcopn-in2p3.cern.ch)
http://netstat.in2p3.fr/weathermap/graphiques/lyo-cern-daily.gif - lyon-nord
- lyo-nrd
http://netstat.in2p3.fr/weathermap/graphiques/lyo-nrd-daily.gif - orsay
- orsay (in2p3-orsay.cssi.renater.fr
http://netstat.in2p3.fr/weathermap/graphiques/orsay-daily.gif - lal-cc
- Liaison LAL - CC
http://netstat.in2p3.fr/weathermap/graphiques/lal-cc-daily.gif - lal
- lal
http://netstat.in2p3.fr/weathermap/graphiques/lal-daily.gif - lpnhe
- lpnhe-nrd (in2p3-jussieu.cssi.renater.fr)
http://netstat.in2p3.fr/weathermap/graphiques/lpnhe-nrd-daily.gif
lpnhe (Paris-LPNHE.in2p3.fr)
http://netstat.in2p3.fr/weathermap/graphiques/lpnhe-daily.gif - lapp
- ann-nrd
http://netstat.in2p3.fr/weathermap/graphiques/ann-nrd-daily.gif - lpc
- lpc-cf
http://netstat.in2p3.fr/weathermap/graphiques/lpc-cf-daily.gif
http://netstat.in2p3.fr/weathermap/graphiques/cppm-daily.gif
- international (tokyo etc.)
- parisnrd
http://netstat.in2p3.fr/weathermap/graphiques/parisnrd-daily.gif
- GEANT - NYC
- MANLAN
http://dc-snmp.wcc.grnoc.iu.edu/manlan/img/sw.newy32aoa.manlan.internet2.edu--te10_1-std5.gif
- NYC - TOKYO
- MANLAN
http://dc-snmp.wcc.grnoc.iu.edu/manlan/img/sw.newy32aoa.manlan.internet2.edu--te11_1-std5.gif
T0-T1
T1-T2
General informations
- FDR page
https://twiki.cern.ch/twiki/bin/view/Atlas/FullDressRehearsal
- Replication status(checked ~ every 2h?)
http://pandamon.usatlas.bnl.gov:25880/server/pandamon/query?mode=listFDR
- Contacts
- IN2P3-CPPM_DATADISK :
- IN2P3-LPSC_DATADISK :
- IN2P3-LPC_DATADISK :
- GRIF-LAL_DATADISK : (grid.admin a lal.in2p3.fr)
- GRIF-LPNHE_DATADISK :
- GRIF-SACLAY_DATADISK :
- BEIJING-LCG2_DATADISK : (yanxf a ihep.ac.cn, Erming.Pei a cern.ch)
- RO-07-NIPNE_DATADISK : (ciubancan a nipne.ro)
- RO-02-NIPNE_DATADISK : (tpreda a nipne.ro)
- TOKYO-LCG2_DATADISK : (lcg-admin a icepp.s.u-tokyo.ac.jp)
Only srmv2+ space tokens sites will collect data. The list of streams is : Egamma, Bphysics, MinBias, Muon and Jet. If no precise stream name is associated to the share, the site will get any stream. The site in parenthesis will collect the complementary datasets. The T2 ressources at T1 means that users without production role can access CPUs at the T1 without the Production role.
* US : o AGLT2_DATADISK : 100 % o MWT2_DATADISK: 100 % o NET2_DATADISK: 100 % o SLACXRD_DATADISK: 100 % o SWT2_CPB_DATADISK: 100 % o T2 CPU ressources in BNL to access AOD * CA : o SFU-LCG2_DATADISK : 40% o VICTORIA-LCG2_DATADISK : 30% o ALBERTA-LCG2_DATADISK : 30% o TORONTO-LCG2_DATADISK : 50 % * IT : o INFN-MILANO_DATADISK : 25% o INFN-FRASCATI_DATADISK : 25% o INFN-ROMA1_DATADISK : 100% o INFN-NAPOLI-ATLAS_DATADISK : 100 % * TW : o TW-FTT_DATADISK : 50 % * NL o JINR-LCG2_DATADISK : 50% Jet (RU-PROTVINO-IHEP_DATADISK) o JINR-LCG2_DATADISK : 50% MinBias (RU-PROTVINO-IHEP_DATADISK) o JINR-LCG2_DATADISK : 50% Muon (RU-PROTVINO-IHEP_DATADISK) o JINR-LCG2_DATADISK : 50% Bphysics (RU-PNPI_DATADISK) o RU-PROTVINO-IHEP_DATADISK : 50% Jet (JINR-LCG2_DATADISK) o RU-PROTVINO-IHEP_DATADISK : 50% Muon (JINR-LCG2_DATADISK) o RU-PROTVINO-IHEP_DATADISK : 50% MinBias (JINR-LCG2_DATADISK) o RU-PROTVINO-IHEP_DATADISK : 50% Egamma (RU-PNPI_DATADISK) o RU-PNPI_DATADISK : 50% Egamma (RU-PROTVINO-IHEP_DATADISK) o RU-PNPI_DATADISK : 50% Bphysics (JINR-LCG2_DATADISK) o RU-PNPI_DATADISK : 50% Jet (RRC-KI_DATADISK) o RU-PNPI_DATADISK : 50% MinBias (RRC-KI_DATADISK) o RRC-KI_DATADISK : 50% Jet (RU-PNPI_DATADISK) o RRC-KI_DATADISK : 50% MinBias (RU-PNPI_DATADISK) o RRC-KI_DATADISK : 50% Egamma o RRC-KI_DATADISK : 50% Bphysics o CSTCDIE_DATADISK : 20% * FR : o IN2P3-LAPP_DATADISK : 100 % Egamma o IN2P3-CPPM_DATADISK : 100 % Egamma o IN2P3-LPSC_DATADISK : 5 % o IN2P3-LPC_DATADISK : 100% Muon, 50% Egamma o BEIJING-LCG2_DATADISK : 100 % Egamma, 50% Muon, 35% MinBias o RO-07-NIPNE_DATADISK : 10% o RO-02-NIPNE_DATADISK : 10% o GRIF-LAL_DATADISK : 45% o GRIF-LPNHE_DATADISK : 25% o GRIF-SACLAY_DATADISK : 30 % o TOKYO-LCG2_DATADISK : 100% o T2 CPU ressources in LYON to access AOD * DE : o DESY-HH_DATADISK : 100% o CSCS-LCG2_DATADISK: 100% o DESY-ZN_DATADISK : 50% o GOEGRID_DATADISK : 50% o WUPPERTALPROD_DATADISK: 25% o CYFRONET-LCG2_DATADISK : 15% o PRAGUELCG2_DATADISK : 25% o LRZ-LMU_DATADISK : 35% o UNI-FREIBURG_DATADISK : 100% Jet, 100% Muon * UK : o UKI-SOUTHGRID-CAM-HEP_DATADISK : 50 % Muon (UKI-NORTHGRID-LANCS-HEP_DATADISK o UKI-NORTHGRID-LANCS-HEP_DATADISK : 50 % Muon (UKI-SOUTHGRID-CAM-HEP_DATADISK) o UKI-NORTHGRID-SHEF-HEP_DATADISK : 50 % MinBias (UKI-SCOTGRID-GLASGOW_DATADISK) o UKI-SCOTGRID-GLASGOW_DATADISK : 50 % MinBias (UKI-NORTHGRID-SHEF-HEP_DATADISK) o UKI-LT2-RHUL_DATADISK : 50 % Egamma (UKI-NORTHGRID-LIV-HEP_DATADISK) o UKI-NORTHGRID-LIV-HEP_DATADISK : 50 % Egamma (UKI-LT2-RHUL_DATADISK) o UKI-SOUTHGRID-RALPP_DATADISK : 50 Egamma (UKI-LT2-RHUL_DATADISK) o UKI-SOUTHGRID-BHAM-HEP_DATADISK : 50 Jet (UKI-SOUTHGRID-OX-HEP_DATADISK) * ES : o IFIC-LCG2_DATADISK: 50% o UAM-LCG2_DATADISK: 25% o IFAE_DATADISK: 25% * NDGF : o SIGNET_DATADISK : 50 %
Logbook
2nd June
- Since Lyon is in donwtime, so there's no transfers from LYON to T2s today.
http://cc.in2p3.fr/cc_accueil.php3?lang=fr
- Free Space
Token Total Guaranteed Unused ------------------------------------------ BEIJING-LCG2_DATADISK 6.366 6.366 6.343 GRIF-LAL_DATADISK 6.0 6.0 5.732 GRIF-LPNHE_DATADISK 3.906 3.906 3.89 GRIF-SACLAY_DATADISK 6.0 6.0 5.994 IN2P3-CPPM_DATADISK 1.0 1.0 0.977 IN2P3-LAPP_DATADISK 4.0 4.0 4.0 IN2P3-LPC_DATADISK 3.0 3.0 3.0 IN2P3-LPSC_DATADISK 0.488 0.488 0.482 RO-02-NIPNE_DATADISK 3.0 3.0 3.0 RO-07-NIPNE_DATADISK 2.0 2.0 2.0 TOKYO-LCG2_DATADISK 15.0 15.0 13.643
3rd June
- Lyon is still in donwtime, no transfers from LYON to T2s in this morning.
4th June
- 22h02
- Still no transfers, desipte that all the services in LYON shows OK.
- Throughput(snapshot)
- Services in LYON
- Free space
BEIJING-LCG2_DATADISK 6.366 6.366 6.343 BEIJING-LCG2_MCDISK 5.457 5.457 5.457 GRIF-LAL_DATADISK 6.0 6.0 5.732 GRIF-LAL_MCDISK 7.0 7.0 3.724 GRIF-LPNHE_DATADISK 3.906 3.906 3.89 GRIF-LPNHE_MCDISK 1.0 1.0 0.194 GRIF-SACLAY_DATADISK 6.0 6.0 5.994 GRIF-SACLAY_MCDISK 2.0 2.0 1.713 IN2P3-CPPM_DATADISK 1.0 1.0 0.977 IN2P3-CPPM_MCDISK 0.781 0.781 0.765 IN2P3-LAPP_DATADISK 4.0 4.0 4.0 IN2P3-LAPP_MCDISK 2.0 2.0 0.745 IN2P3-LPC_DATADISK 3.0 3.0 3.0 IN2P3-LPC_MCDISK 2.0 2.0 1.936 IN2P3-LPSC_DATADISK 0.488 0.488 0.482 IN2P3-LPSC_MCDISK 2.0 2.0 1.518 RO-02-NIPNE_DATADISK 3.0 3.0 3.0 RO-02-NIPNE_MCDISK 1.0 1.0 0.995 RO-07-NIPNE_DATADISK 2.0 2.0 2.0 RO-07-NIPNE_MCDISK 1.0 1.0 0.654 TOKYO-LCG2_DATADISK 15.0 15.0 13.643 TOKYO-LCG2_MCDISK 10.0 10.0 10.0
5th June
- Data produced on Tuesday are supposed to be shipped in this night.
- 14h50
- Now the RAW data has begun to shipped to LYON.
- Throughput(snapshot)(MB/s)
- Number of transfer errors (snapshot)
- 21h30
- IN2P3-CC_MCDISK has some problem of staging file. But it should have nothing to do with T0 export.
- 47 datasets have been exported to T1s. The total is 90.
- Throughput(snapshot)(MB/s)
- Number of transfer errors (snapshot) -- It's the problem of staging file in T1 and nothing to do with FDR transfers.
6th June
- 09h50
- Throughput(snapshot) (MB/s)
- Number of transfer errors (snapshot)
- 10h03
- Now all the RAW datasets have been exported to T1. The total number is 90.
http://atlas.web.cern.ch/Atlas/tzero/prod2/monitoring/tables/ddm_RAW_status.html
7th June
- at CERN, will re-start the bulk processing with using the upcoming AtlasPoint1-14.1.0.12 release cache.
- 11h30
- SRM in LYON is still in bad shape. Experts are working on it.
8th June
- 14h20
- FDR2 AODs/DPDs/TAGs/ESDs are being exported to LYON (IN2P3-CC_DATADISK), but haven't been shipped to T2s.
- Number of datasets transfered by far
$ dq2-list-dataset-site IN2P3-CC_DATADISK|grep fdr08_run2.*|wc -l 68
- Throughput(T0-T1,snapshot)(MB/s)
- Number of transfer errors
- Throughput(T1-T2,snapshot)(MB/s)
9th June
- 0h20
- FDR datasets has begun to be shipped to T2s.
- Some T2s (e.g. BEIJING) subscribed dedicated datasets, which will be done later in today's morning.
- Shipped datasets
IN2P3-CC_DATADISK 315 BEIJING-LCG2_DATADISK 0 GRIF-LAL_DATADISK 79 GRIF-LPNHE_DATADISK 39 GRIF-SACLAY_DATADISK 38 IN2P3-CPPM_DATADISK 0 IN2P3-LAPP_DATADISK 12 IN2P3-LPC_DATADISK 12 IN2P3-LPSC_DATADISK 32 RO-02-NIPNE_DATADISK 29 RO-07-NIPNE_DATADISK 46 TOKYO-LCG2_DATADISK 134
- Free space
BEIJING-LCG2_DATADISK 6.366 6.366 6.343 GRIF-LAL_DATADISK 6.0 6.0 5.579 GRIF-LPNHE_DATADISK 3.906 3.906 3.817 GRIF-SACLAY_DATADISK 6.0 6.0 5.927 IN2P3-CPPM_DATADISK 1.0 1.0 0.977 IN2P3-LAPP_DATADISK 4.0 4.0 4.0 IN2P3-LPC_DATADISK 0 0 0 IN2P3-LPSC_DATADISK 0.488 0.488 0.439 RO-02-NIPNE_DATADISK 3.0 3.0 2.986 RO-07-NIPNE_DATADISK 2.0 2.0 1.896 TOKYO-LCG2_DATADISK 15.0 15.0 13.313
- Troughputs(snapshot)(MB/s)
- Number of transfer erros(snapshot)
- Note: No errors found in FDR2 transfers. Errors showed in this snapshot are since other production data transfers.
- 09h24
- Troughputs(snapshot)(MB/s)
- Number of transfer erros(snapshot)
- Note: No errors found in FDR2 transfers. Errors showed in this snapshot are since other production data transfers.
- Shipped datasets
IN2P3-CC_DATADISK 319 BEIJING-LCG2_DATADISK 0 GRIF-LAL_DATADISK 79 GRIF-LPNHE_DATADISK 39 GRIF-SACLAY_DATADISK 38 IN2P3-CPPM_DATADISK 0 IN2P3-LAPP_DATADISK 12 IN2P3-LPC_DATADISK 12 IN2P3-LPSC_DATADISK 32 RO-02-NIPNE_DATADISK 59 RO-07-NIPNE_DATADISK 49 TOKYO-LCG2_DATADISK 134
- Free space
BEIJING-LCG2_DATADISK 6.366 6.366 6.343 GRIF-LAL_DATADISK 6.0 6.0 5.579 GRIF-LPNHE_DATADISK 3.906 3.906 3.817 GRIF-SACLAY_DATADISK 6.0 6.0 5.927 IN2P3-CPPM_DATADISK 1.0 1.0 0.977 IN2P3-LAPP_DATADISK 4.0 4.0 4.0 IN2P3-LPC_DATADISK 3.0 3.0 3.0 IN2P3-LPSC_DATADISK 0.488 0.488 0.439 RO-02-NIPNE_DATADISK 3.0 3.0 2.945 RO-07-NIPNE_DATADISK 2.0 2.0 1.885 TOKYO-LCG2_DATADISK 15.0 15.0 13.313
- 14h00
Many failed with RO-02-NIPNE_DATADISK.
[FTS] FTS State [Failed] FTS Retries [1] Reason [TRANSFER error during TRANSFER phase: [PERMISSION] the server sent an error response: 550 550 rfio write failure: Permission denied.] Source Host [ccsrm.in2p3.fr] 16 [FTS] FTS State [Failed] FTS Retries [1] Reason [TRANSFER error during TRANSFER phase: [TRANSFER_TIMEOUT] gridftp_copy_wait: Connection timed out] Source Host [ccsrm.in2p3.fr] 9 [FTS] FTS State [Failed] FTS Retries [1] Reason [TRANSFER error during TRANSFER phase: [GRIDFTP] the server sent an error response: 426 426 Transfer aborted (Unexpected Exception : java.io.IOException: Broken pipe)] Source Host [ccsrm.in2p3.fr] 2 [FTS] FTS State [Failed] FTS Retries [1] Reason [The process serving the transfer (status = PREPARING) is no longer active (could not open file /proc/3775/cmdline)] Source Host [ccsrm.in2p3.fr] 1 [FTS] FTS State [Failed] FTS Retries [1] Reason [The process serving the transfer (status = PREPARING) is no longer active (could not open file /proc/30076/cmdline)] Source Host [ccsrm.in2p3.fr] 1
- 20h10
- FDR datasets have finished transferred to all T2s except BEIJING-LCG2_DATADISK and RO-02-NIPNE_DATADISK.
- Many "Connection timed out" errors found at BEIJING-LCG2_DATADISK and RO-02-NIPNE_DATADISK.
[FTS] FTS State [Failed] FTS Retries [1] Reason [SOURCE error during PREPARATION phase: [CONNECTION] failed to contact on remote SRM [httpg://ccsrm.in2p3.fr:8443/srm/managerv2]. Givin' up after 3 tries] Source Host [ccsrm.in2p3.fr] 111 [FTS] FTS State [Failed] FTS Retries [1] Reason [TRANSFER error during TRANSFER phase: [TRANSFER_TIMEOUT] gridftp_copy_wait: Connection timed out] Source Host [ccsrm.in2p3.fr] 95 [FTS] FTS State [Failed] FTS Retries [1] Reason [SOURCE error during PREPARATION phase: [REQUEST_TIMEOUT] failed to prepare source file in 180 seconds] Source Host [ccsrm.in2p3.fr] 3
- Shipped datasets
IN2P3-CC_DATADISK 134 BEIJING-LCG2_DATADISK 79 GRIF-LAL_DATADISK 67 GRIF-LPNHE_DATADISK 32 GRIF-SACLAY_DATADISK 26 IN2P3-CPPM_DATADISK 52 IN2P3-LAPP_DATADISK 52 IN2P3-LPC_DATADISK 52 IN2P3-LPSC_DATADISK 32 RO-02-NIPNE_DATADISK 49 RO-07-NIPNE_DATADISK 49 TOKYO-LCG2_DATADISK 126
10th June
- Summary
All T2s except BEIJING-LCG2_DATADISK and RO-02-NIPNE_DATADISK were in good status when FDR transferring. The problem of "Connection timed out" with transfers to BEIJING will be hopefully solved by tuned the transfer duration >1800s by Lionel. RO-02-NIPNE is said being upgraded from SL3 to SL4.
11st June
- Anouncement
All FDR2 datasets will be deleted from all sites since part of the detector was not included in the reconstruction. It was not detected by the Data Quality because it is run on the express stream which is correct.
Old datasets will be deleted centrally by Stephane and new datasets that would be named begin with "ccrc08-run.0024*" will be re-generated and re-exported soon in today.
T2s will receive the same share rate as requested for FDR (except that DPD are not produced).