Difference between revisions of "Atlas"
(→DOMA_FR tests) |
|||
Ligne 95: | Ligne 95: | ||
* Issues | * Issues | ||
+ | ** Asynchronous transfers of input files goes to closest/fastest site to input site instead of the closest/fastest RSE to the WN (would help to reduce network occupancy between IN2P3-CC and LAPP/LPSC since smoothed by FTS) -> Request to change on 21st September (Panda level) | ||
+ | ** Jobs brokering should take into account downtime of remote SEs (issue with IN2P3-CC downtime) -> Request sent by Rod | ||
+ | |||
Version du 20:05, 21 septembre 2018
Bienvenue sur la page Atlas LCG-France
Welcome to the LCG-France Atlas page
DOMA_FR project
DOMA_FR tests
- Global transfers from/ to (same site dest/source site excluded)
Header text | LAPP | LPSC | CC |
---|---|---|---|
From | |||
To |
- Data access monitorings as seen by site WN
Destination of access | LAPP | LPSC | CC |
---|---|---|---|
Production download | |||
Production upload | |||
Production input | |||
Production output | |||
Analysis Download | |||
Analysis Direct Access |
- Current conclusion
- Enabling direct access creates much more network usage -> Is it usefull for ATLAS ? (processing urgent request faster ?)
- Production job brokering prefers to send input data (Production_input) to IN2P3-CC_DATADISK instead of IN2P3-LAPP_DATADISK because single file connected is (expected to be better). Can we check if there are more temporary files in IN2P3-CC_DATADISK ?
- Request to Panda team to request FTS transfer to read_lan0 instead of the network closest/dastest site
- 10 Gb/s connection LAPP-CC (used for all LAPP WAN transfers to any site) can get saturated if huge amount of jobs starting at same time (No smoothing by Panda)
- Production_output can only be done to 1 site
- Issues
- Asynchronous transfers of input files goes to closest/fastest site to input site instead of the closest/fastest RSE to the WN (would help to reduce network occupancy between IN2P3-CC and LAPP/LPSC since smoothed by FTS) -> Request to change on 21st September (Panda level)
- Jobs brokering should take into account downtime of remote SEs (issue with IN2P3-CC downtime) -> Request sent by Rod
- Next steps
- Deploy Rucio 1.17 to use protocol priority defined in AGIS (bug in 1.16). Solving bug : use most trusted protocol and would help to control srm decreasing usage
- Monitor job efficiency vs RTT between SE and WN -> Identify when cache does not have impact (assuming no network bandwidth limitation)
- Understand the ATLAS job brokering algorithm
- Get the typical transfer rate per job type