buildtest.executors.pbs
¶
This module implements PBSExecutor class that defines how executors submit job to PBS Scheduler
Module Contents¶
Classes¶
The PBSExecutor class is responsible for submitting jobs to PBS Scheduler. |
|
The PBSJob models a PBS Job with helper methods to retrieve job state, check if job is running/pending/suspended. We have methods |
Attributes¶
- buildtest.executors.pbs.logger¶
- class buildtest.executors.pbs.PBSExecutor(name, settings, site_configs, max_pend_time=None)[source]¶
Bases:
buildtest.executors.base.BaseExecutor
The PBSExecutor class is responsible for submitting jobs to PBS Scheduler. The class implements the following methods:
load: load PBS executors from configuration file
dispatch: submit PBS job to scheduler
poll: poll PBS job via qstat and retrieve job state
gather: gather job result
cancel: cancel job if it exceeds max pending time
Initiate a base executor, meaning we provide a name (also held by the BuildExecutor base that holds it) and the loaded dictionary of config opts to parse.
- Parameters
name (str) – name of executor
setting (dict) – setting for a given executor defined in configuration file
site_configs (buildtest.config.SiteConfiguration) – Instance of SiteConfiguration class
- type = pbs¶
- poll_cmd = qstat¶
- dispatch(self, builder)[source]¶
This method is responsible for dispatching PBS job, get JobID and start record metadata in builder object. If job failed to submit we check returncode and exit with failure. After we submit job, we start timer and record when job was submitted and poll job once to get job details and store them in builder object.
- Parameters
builder (buildtest.buildsystem.base.BuilderBase) – An instance object of BuilderBase type
- poll(self, builder)[source]¶
This method is responsible for polling PBS job which will update the job state. If job is complete we will gather job result. If job is pending we will stop timer and check if pend time exceeds max pend time for executor. If so we will cancel the job.
- Parameters
builder (buildtest.buildsystem.base.BuilderBase) – An instance object of BuilderBase type
- gather(self, builder)[source]¶
This method is responsible for gather job results including output and error file and complete metadata for job which is stored in the builder object. We will retrieve job exitcode which corresponds to test returncode.
- Parameters
builder (buildtest.buildsystem.base.BuilderBase) – An instance object of BuilderBase type
- class buildtest.executors.pbs.PBSJob(jobID)[source]¶
Bases:
buildtest.executors.job.Job
The PBSJob models a PBS Job with helper methods to retrieve job state, check if job is running/pending/suspended. We have methods to poll job state, gather job results upon completion and cancel job.
See https://www.altair.com/pdfs/pbsworks/PBSReferenceGuide2021.1.pdf section 8.1 for list of Job State Codes
- is_suspended(self)[source]¶
Return
True
if job is suspended which would be in one of these statesH
,U
,S
.
- success(self)[source]¶
This method determines if job was completed successfully and returns
True
if exit code is 0.According to https://www.altair.com/pdfs/pbsworks/PBSAdminGuide2021.1.pdf section 14.9 Job Exit Status Codes we have the following
Exit Code: X < 0 - Job could not be executed
Exit Code: 0 <= X < 128 - Exit value of Shell or top-level process
Exit Code: X >= 128 - Job was killed by signal
Exit Code: X == 0 - Job executed was a successful
- poll(self)[source]¶
This method will poll the PBS Job by running
qstat -x -f -F json <jobid>
which will report job data in JSON format that can be parsed to extract the job state. In PBS the active job state can be retrieved by reading propertyjob_state
property. Shown below is an example output[pbsuser@pbs tests]$ qstat -x -f -F json 157.pbs { "timestamp":1630683518, "pbs_version":"19.0.0", "pbs_server":"pbs", "Jobs":{ "157.pbs":{ "Job_Name":"pbs_hold_job", "Job_Owner":"pbsuser@pbs", "job_state":"H", "queue":"workq", "server":"pbs", "Checkpoint":"u", "ctime":"Fri Aug 20 23:14:08 2021", "Error_Path":"pbs:/tmp/GitHubDesktop/buildtest/var/tests/generic.pbs.workq/hold/pbs_hold_job/da6d5b57/stage/pbs_hold_job.e157", "Hold_Types":"u", "Join_Path":"n", "Keep_Files":"n", "Mail_Points":"a", "mtime":"Fri Aug 20 23:14:08 2021", "Output_Path":"pbs:/tmp/GitHubDesktop/buildtest/var/tests/generic.pbs.workq/hold/pbs_hold_job/da6d5b57/stage/pbs_hold_job.o157", "Priority":0, "qtime":"Fri Aug 20 23:14:08 2021", "Rerunable":"True", "Resource_List":{ "ncpus":1, "nodect":1, "nodes":1, "place":"scatter", "select":"1:ncpus=1", "walltime":"00:02:00" }, "substate":20, "Variable_List":{ "PBS_O_HOME":"/home/pbsuser", "PBS_O_LOGNAME":"pbsuser", "PBS_O_PATH":"/tmp/GitHubDesktop/buildtest/bin:/tmp/github/buildtest/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/pbs/bin:/home/pbsuser/.local/bin:/home/pbsuser/bin", "PBS_O_MAIL":"/var/spool/mail/pbsuser", "PBS_O_SHELL":"/bin/bash", "PBS_O_WORKDIR":"/tmp/GitHubDesktop/buildtest/var/tests/generic.pbs.workq/hold/pbs_hold_job/da6d5b57/stage", "PBS_O_SYSTEM":"Linux", "PBS_O_QUEUE":"workq", "PBS_O_HOST":"pbs" }, "Submit_arguments":"-q workq /tmp/GitHubDesktop/buildtest/var/tests/generic.pbs.workq/hold/pbs_hold_job/da6d5b57/stage/pbs_hold_job.sh", "project":"_pbs_project_default" } } }