Atlas:Analysis ST 2009 Errors

--Chollet 14:24, 15 mai 2009 (CEST)

Errors follow-up - Known issues

  • Corrupted input AOD files found

- Badread error example at byte:32849441, branch:m_genParticles.m_endVtx, entry:217, badread=0

- To check if the file is corrupted on SE or not, please refer to ATLAS procedure (requests an certificate approved by ATLAS) defined here:

  • IN2P3-CPPM / IN2P3-LPC : overload of lcg-gt during SURL to TURL conversion

Jobs running forever with error, killed by the batch system. The error should at least be catched by Ganga - Savannah ticket opened :

 send2nsd: NS009 - fatal configuration error: Host unknown:
 send2dpm: DP000 - disk pool manager not running on 

This arrive for 13 jobs, all starts running nearly at the same time Thu Jan 29 22:37:53 and run in error around Jan 30 00:21. I have put two of this stdout, stderr there :
Heavy load of the local DPM server observed at that time.

  • IN2P3-LAPP : during ST125, 30/01/09 jobs still running after 2500 minutes, failing to connect to LFC with the message :
send2nsd: NS002 - send error : _Csec_recv_token: Received magic:30e1301 expecting ca03
cannot connect to LFC 

Service was up and running fine at that time Is it due to an expired proxy ?

  • RO_07 : ST jobs can not store the output locally and are always using the fail-over storing output files in Lyon

still configuration problem with the USERDISK token on SE ?

lcg-cr --vo atlas -s ATLASUSERDISK  -t 2400 -d srm://
dpm_getspacetoken: Unknown user space token description