buildtest.executors.pbs

This module implements PBSExecutor class that defines how executors submit job to PBS Scheduler

Module Contents

Classes

PBSExecutor

The PBSExecutor class is responsible for submitting jobs to PBS Scheduler.

PBSJob

The PBSJob models a PBS Job with helper methods to retrieve job state, check if job is running/pending/suspended. We have methods

Attributes

logger

buildtest.executors.pbs.logger
class buildtest.executors.pbs.PBSExecutor(name, settings, site_configs, account=None, max_pend_time=None)[source]

Bases: buildtest.executors.base.BaseExecutor

The PBSExecutor class is responsible for submitting jobs to PBS Scheduler. The class implements the following methods:

  • load: load PBS executors from configuration file

  • dispatch: submit PBS job to scheduler

  • poll: poll PBS job via qstat and retrieve job state

  • gather: gather job result

  • cancel: cancel job if it exceeds max pending time

Initiate a base executor, meaning we provide a name (also held by the BuildExecutor base that holds it) and the loaded dictionary of config opts to parse.

Parameters
type = pbs
launcher = qsub
load(self)[source]

Load the a PBS executor configuration from buildtest settings.

launcher_command(self, numprocs=None, numnodes=None)[source]
dispatch(self, builder)[source]

This method is responsible for dispatching PBS job, get JobID and start record metadata in builder object. If job failed to submit we check returncode and exit with failure. After we submit job, we start timer and record when job was submitted and poll job once to get job details and store them in builder object.

Parameters

builder (buildtest.buildsystem.base.BuilderBase) – An instance object of BuilderBase type

poll(self, builder)[source]

This method is responsible for polling PBS job which will update the job state. If job is complete we will gather job result. If job is pending we will stop timer and check if pend time exceeds max pend time for executor. If so we will cancel the job.

Parameters

builder (buildtest.buildsystem.base.BuilderBase) – An instance object of BuilderBase type

gather(self, builder)[source]

This method is responsible for gather job results including output and error file and complete metadata for job which is stored in the builder object. We will retrieve job exitcode which corresponds to test returncode.

Parameters

builder (buildtest.buildsystem.base.BuilderBase) – An instance object of BuilderBase type

class buildtest.executors.pbs.PBSJob(jobID)[source]

Bases: buildtest.executors.job.Job

The PBSJob models a PBS Job with helper methods to retrieve job state, check if job is running/pending/suspended. We have methods to poll job state, gather job results upon completion and cancel job.

See https://www.altair.com/pdfs/pbsworks/PBSReferenceGuide2021.1.pdf section 8.1 for list of Job State Codes

is_pending(self)[source]

Return True if job is pending. A pending job is in state Q.

is_running(self)[source]

Return True if job is running. A completed job is in state R.

is_complete(self)[source]

Return True if job is complete. A completed job is in state F.

is_suspended(self)[source]

Return True if job is suspended which would be in one of these states H, U, S.

output_file(self)[source]

Return output file of job

error_file(self)[source]

Return error file of job

exitcode(self)[source]

Return exit code of job

success(self)[source]

This method determines if job was completed successfully and returns True if exit code is 0.

According to https://www.altair.com/pdfs/pbsworks/PBSAdminGuide2021.1.pdf section 14.9 Job Exit Status Codes we have the following

  • Exit Code: X < 0 - Job could not be executed

  • Exit Code: 0 <= X < 128 - Exit value of Shell or top-level process

  • Exit Code: X >= 128 - Job was killed by signal

  • Exit Code: X == 0 - Job executed was a successful

fail(self)[source]

Return True if their is a job failure which would be if exit code is not 0

poll(self)[source]

This method will poll the PBS Job by running qstat -x -f -F json <jobid> which will report job data in JSON format that can be parsed to extract the job state. In PBS the active job state can be retrieved by reading property job_state property. Shown below is an example output

[pbsuser@pbs tests]$ qstat -x -f -F json 157.pbs
{
    "timestamp":1630683518,
    "pbs_version":"19.0.0",
    "pbs_server":"pbs",
    "Jobs":{
        "157.pbs":{
            "Job_Name":"pbs_hold_job",
            "Job_Owner":"pbsuser@pbs",
            "job_state":"H",
            "queue":"workq",
            "server":"pbs",
            "Checkpoint":"u",
            "ctime":"Fri Aug 20 23:14:08 2021",
            "Error_Path":"pbs:/tmp/GitHubDesktop/buildtest/var/tests/generic.pbs.workq/hold/pbs_hold_job/da6d5b57/stage/pbs_hold_job.e157",
            "Hold_Types":"u",
            "Join_Path":"n",
            "Keep_Files":"n",
            "Mail_Points":"a",
            "mtime":"Fri Aug 20 23:14:08 2021",
            "Output_Path":"pbs:/tmp/GitHubDesktop/buildtest/var/tests/generic.pbs.workq/hold/pbs_hold_job/da6d5b57/stage/pbs_hold_job.o157",
            "Priority":0,
            "qtime":"Fri Aug 20 23:14:08 2021",
            "Rerunable":"True",
            "Resource_List":{
                "ncpus":1,
                "nodect":1,
                "nodes":1,
                "place":"scatter",
                "select":"1:ncpus=1",
                "walltime":"00:02:00"
            },
            "substate":20,
            "Variable_List":{
                "PBS_O_HOME":"/home/pbsuser",
                "PBS_O_LOGNAME":"pbsuser",
                "PBS_O_PATH":"/tmp/GitHubDesktop/buildtest/bin:/tmp/github/buildtest/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/pbs/bin:/home/pbsuser/.local/bin:/home/pbsuser/bin",
                "PBS_O_MAIL":"/var/spool/mail/pbsuser",
                "PBS_O_SHELL":"/bin/bash",
                "PBS_O_WORKDIR":"/tmp/GitHubDesktop/buildtest/var/tests/generic.pbs.workq/hold/pbs_hold_job/da6d5b57/stage",
                "PBS_O_SYSTEM":"Linux",
                "PBS_O_QUEUE":"workq",
                "PBS_O_HOST":"pbs"
            },
            "Submit_arguments":"-q workq /tmp/GitHubDesktop/buildtest/var/tests/generic.pbs.workq/hold/pbs_hold_job/da6d5b57/stage/pbs_hold_job.sh",
            "project":"_pbs_project_default"
        }
    }
}
gather(self)[source]

This method is called once job is complete. We will gather record of job by running qstat -x -f -F json <jobid> and return the json object as a dict. This method is responsible for getting output file, error file and exit status of job.

cancel(self)[source]

Cancel PBS job by running qdel <jobid>.