Difference between revisions of "Atlas:Analysis ST 2009 Errors"

Version du 15:19, 4 février 2009

30.01.09

Comments and Errors follow-up

http://gangarobot.cern.ch/st/test_124/ DQ2_LOCAL (rfio)
http://gangarobot.cern.ch/st/test_125/ FILESTAGER (lcg_cp)

Note that ATLAS Production was ON on the FR-Cloud on January 29

IN2P3-LPC_MCDISK: f(w) - Errors due to the load induced by MC production running at that time. Then ST tests jobs (2 x 50 jobs added)were aborted with an "Unspecified gridmanager error" logged by WMS.

In fact, the submission to the batch system has failed because the max. total number of jobs (GlueCEPolicyMaxTotalJobs) was reached . Probably this value is not looked at as it could be before submission by WMS.

Jan 29 23:54:46 clrlcgce03 lcgpbs:internal_ FAILED during submission 
to batch system lcgpbs(Maximum number of jobs already in queue)..

IN2P3-CPPM_MCDISK: The same problem has in previous test. Jobs running forever with error ."send2dpm: DP000 - disk pool manager not running on marwn04.in2p3.fr ". This arrive for 13 jobs, all starts running nearly at the same time Thu Jan 29 22:37:53 and run in error around Jan 30 00:21. I have put two of this stdout, stderr there

http://marwww.in2p3.fr/~knoops/752629.marce01.in2p3.fr/ http://marwww.in2p3.fr/~knoops/752631.marce01.in2p3.fr/

The load of the local DPM server was around 9 at that time.

Difference between revisions of "Atlas:Analysis ST 2009 Errors"

Version du 15:19, 4 février 2009

Comments and Errors follow-up

Navigation menu

Outils personnels

Namespaces

Variants

Views

More

Rechercher

LCG Project

Wiki Tools

Tools

@@ Ligne 7: / Ligne 7: @@
 * IN2P3-LPC_MCDISK: f(w)   - Errors due to the load induced by MC production running at that time. Then ST tests jobs (2 x 50 jobs added)were aborted with an "Unspecified gridmanager error" logged by WMS.<br>
-In fact, the submission to the batch system has failed because the '''max. total number of jobs (GlueCEPolicyMaxTotalJobs) was reached ''' <br>
+In fact, the submission to the batch system has failed because the '''max. total number of jobs (GlueCEPolicyMaxTotalJobs) was reached '''. Probably this value is not looked at as it could be before submission by WMS. <br>
-  Jan 29 23:54:46 clrlcgce03 lcgpbs:internal_ FAILED during
+  Jan 29 23:54:46 clrlcgce03 lcgpbs:internal_ FAILED during submission
-  submission to batch system lcgpbs(Maximum number of jobs already  in queue)..
+ to batch system lcgpbs(Maximum number of jobs already in queue)..
-Probably this value is not looked at as it could be before submission by WMS.
 * IN2P3-CPPM_MCDISK:  The same problem has in previous test. Jobs running forever with error ."send2dpm: DP000 - disk pool manager not running on marwn04.in2p3.fr ". This arrive for 13 jobs, all starts running nearly at the same time Thu Jan 29 22:37:53 and run in error around Jan 30 00:21. I have put two of this stdout, stderr there