Difference between revisions of "Atlas:Analysis ST 2009 Errors"

Un article de lcgwiki.
Jump to: navigation, search
(Comments, Sites feed-back and Errors follow-up related to ST test 124 and 125)
Ligne 11: Ligne 11:
 
- Job got an error while in the CondorG queue.<br>
 
- Job got an error while in the CondorG queue.<br>
 
The submission to the batch system has failed because the '''maximum number of jobs accepted in queue by the site was reached ''' <br>
 
The submission to the batch system has failed because the '''maximum number of jobs accepted in queue by the site was reached ''' <br>
- queue atlas max_queuable = 200 in the batch system, 'GlueCEPolicyMaxTotalJobs'
+
- queue atlas max_queuable = 200 in the batch system, Attributes 'GlueCEPolicyMaxTotalJobs' on the queue:
 +
dn: GlueCEUniqueID=clrlcgce03.in2p3.fr:2119/jobmanager-lcgpbs-atlas,Mds-Vo-name=IN2P3-LPC,o=grid
 +
...
 +
GlueCEPolicyMaxRunningJobs: 196
 +
GlueCEPolicyMaxWaitingJobs: 0
 +
GlueCEPolicyMaxTotalJobs: 200
 +
GlueCEPolicyMaxWallClockTime: 4320
 +
 
 
  Jan 29 23:54:46 clrlcgce03 gridinfo: [25608-30993] Job 1233269583:
 
  Jan 29 23:54:46 clrlcgce03 gridinfo: [25608-30993] Job 1233269583:
 
  lcgpbs:internal_ FAILED during submission to batch system lcgpbs
 
  lcgpbs:internal_ FAILED during submission to batch system lcgpbs
 
  01/29/2009 23:55:07;0080;PBS_Server;Req;req_reject;Reject reply code=15046(Maximum
 
  01/29/2009 23:55:07;0080;PBS_Server;Req;req_reject;Reject reply code=15046(Maximum
 
  number of jobs already in queue), aux=0..
 
  number of jobs already in queue), aux=0..

Version du 11:13, 30 janvier 2009

30.01.09

Comments, Sites feed-back and Errors follow-up related to ST test 124 and 125

Note that ATLAS Production was ON on the FR-Cloud on January 29

  • IN2P3-LPC_MCDISK: f(w) - Errors due to load induced by MC production running at that time + ST tests (2 x 50 jobs added)

Jobs are aborted with Logged Reason by wms
- Got a job held event, reason: Unspecified gridmanager error
- Job got an error while in the CondorG queue.
The submission to the batch system has failed because the maximum number of jobs accepted in queue by the site was reached
- queue atlas max_queuable = 200 in the batch system, Attributes 'GlueCEPolicyMaxTotalJobs' on the queue: dn: GlueCEUniqueID=clrlcgce03.in2p3.fr:2119/jobmanager-lcgpbs-atlas,Mds-Vo-name=IN2P3-LPC,o=grid ... GlueCEPolicyMaxRunningJobs: 196 GlueCEPolicyMaxWaitingJobs: 0 GlueCEPolicyMaxTotalJobs: 200 GlueCEPolicyMaxWallClockTime: 4320

Jan 29 23:54:46 clrlcgce03 gridinfo: [25608-30993] Job 1233269583:
lcgpbs:internal_ FAILED during submission to batch system lcgpbs
01/29/2009 23:55:07;0080;PBS_Server;Req;req_reject;Reject reply code=15046(Maximum
number of jobs already in queue), aux=0..