Difference between revisions of "Atlas:Analysis ST 2009 Errors"

Un article de lcgwiki.
Jump to: navigation, search
(Errors follow-up - Known issues)
(Errors follow-up - Known issues)
Ligne 2: Ligne 2:
  
 
== Errors follow-up - Known issues ==
 
== Errors follow-up - Known issues ==
* Corrupted input AOD files found
+
* '''Corrupted input AOD files found'''
 
   AOD.027097._37998.pool.root   
 
   AOD.027097._37998.pool.root   
 
   AOD.027579._24654.pool.root  
 
   AOD.027579._24654.pool.root  
 
   AOD.027076._10514.pool.root
 
   AOD.027076._10514.pool.root
  
* IN2P3-CPPM_MCDISK:  overload of lcg-gt during SURL to TURL conversion  
+
* IN2P3-CPPM / IN2P3-LPC '''overload of lcg-gt during SURL to TURL conversion'''
 
Jobs running forever with error, killed by the batch system.
 
Jobs running forever with error, killed by the batch system.
 +
The error should at least be catched by Ganga - Savannah ticket opened :
 +
https://savannah.cern.ch/bugs/index.php?48537
 
   send2nsd: NS009 - fatal configuration error: Host unknown:  dpnshome.in2p3.fr
 
   send2nsd: NS009 - fatal configuration error: Host unknown:  dpnshome.in2p3.fr
 
   send2dpm: DP000 - disk pool manager not running on marwn04.in2p3.fr  
 
   send2dpm: DP000 - disk pool manager not running on marwn04.in2p3.fr  
Ligne 17: Ligne 19:
 
Heavy load of the local DPM server observed at that time.
 
Heavy load of the local DPM server observed at that time.
  
* IN2P3-LAPP : during ST125, 30/01/09 jobs still running after 2500 minutes, failing to connect to LFC with the message :
+
* IN2P3-LAPP : during ST125, 30/01/09 '''jobs still running after 2500 minutes, failing to connect to LFC''' with the message :
 
  send2nsd: NS002 - send error : _Csec_recv_token: Received magic:30e1301 expecting ca03
 
  send2nsd: NS002 - send error : _Csec_recv_token: Received magic:30e1301 expecting ca03
 
  cannot connect to LFC  
 
  cannot connect to LFC  

Version du 17:18, 22 avril 2009

--Chollet 16:40, 20 mars 2009 (CET)

Errors follow-up - Known issues

  • Corrupted input AOD files found
 AOD.027097._37998.pool.root  
 AOD.027579._24654.pool.root 
 AOD.027076._10514.pool.root
  • IN2P3-CPPM / IN2P3-LPC : overload of lcg-gt during SURL to TURL conversion

Jobs running forever with error, killed by the batch system. The error should at least be catched by Ganga - Savannah ticket opened : https://savannah.cern.ch/bugs/index.php?48537

 send2nsd: NS009 - fatal configuration error: Host unknown:  dpnshome.in2p3.fr
 send2dpm: DP000 - disk pool manager not running on marwn04.in2p3.fr 

This arrive for 13 jobs, all starts running nearly at the same time Thu Jan 29 22:37:53 and run in error around Jan 30 00:21. I have put two of this stdout, stderr there :
http://marwww.in2p3.fr/~knoops/752629.marce01.in2p3.fr/
http://marwww.in2p3.fr/~knoops/752631.marce01.in2p3.fr/
Heavy load of the local DPM server observed at that time.

  • IN2P3-LAPP : during ST125, 30/01/09 jobs still running after 2500 minutes, failing to connect to LFC with the message :
send2nsd: NS002 - send error : _Csec_recv_token: Received magic:30e1301 expecting ca03
cannot connect to LFC 

Service was up and running fine at that time Is it due to an expired proxy ?

  • RO_07 : ST jobs can not store the output locally and are always using the fail-over storing output files in Lyon

still configuration problem with the USERDISK token on tbit00.nipne.ro SE ?

lcg-cr --vo atlas -s ATLASUSERDISK  -t 2400 -d srm://tbit00.nipne.ro/dpm/.....
dpm_getspacetoken: Unknown user space token description