buildtest.executors.lsf
¶
This module implements the LSFExecutor class responsible for submitting jobs to LSF Scheduler. This class is called in class BuildExecutor when initializing the executors.
Module Contents¶
Classes¶
The LSFExecutor class is responsible for submitting jobs to LSF Scheduler. |
|
This is a base class for holding job level data and common methods for used |
Attributes¶
- class buildtest.executors.lsf.LSFExecutor(name, settings, site_configs, max_pend_time=None)¶
Bases:
buildtest.executors.base.BaseExecutor
The LSFExecutor class is responsible for submitting jobs to LSF Scheduler. The LSFExecutor performs the following steps
load: load lsf configuration from buildtest configuration file
dispatch: dispatch job to scheduler and acquire job ID
poll: wait for LSF jobs to finish
gather: Once job is complete, gather job data
- type = lsf¶
- dispatch(self, builder)¶
This method is responsible for dispatching job to scheduler and extracting job ID by applying a
re.search
against output at onset of job submission. If job id is not retrieved due to job failure or unable to match regular expression we mark job incomplete by invokingbuilder.incomplete()
method and return from method.If we have a valid job ID we invoke
LSFJob
class given the job id to poll job and store this intobuilder.job
attribute.- Parameters
builder (BuilderBase, required) – builder object
- gather(self, builder)¶
Gather Job detail after completion of job by invoking the builder method
builder.job.gather()
. We retrieve exit code, output file, error file and update builder metadata.- Parameters
builder (BuilderBase, required) – builder object
- launcher_command(self)¶
This command returns the launcher command and any options specified in configuration file. This is useful when generating the build script in the BuilderBase class
- load(self)¶
Load the a LSF executor configuration from buildtest settings.
- poll(self, builder)¶
Given a builder object we poll the job by invoking builder method
builder.job.poll()
return state of job. If job is suspended or pending we stop timer and check if timer exceeds max_pend_time value which could be defined in configuration file or passed via command line--max-pend-time
- Parameters
builder (BuilderBase, required) – builder object
- class buildtest.executors.lsf.LSFJob(jobID)¶
Bases:
buildtest.executors.job.Job
This is a base class for holding job level data and common methods for used for batch job submission.
- cancel(self)¶
Cancel LSF Job by running
bkill <jobid>
. This is called if job has exceeded max_pend_time limit during poll stage.
- error_file(self)¶
Return job error file
- exitcode(self)¶
Return job exit code
- gather(self)¶
Gather Job record at onset of job completion by running
bjobs -o '<format1> <format2>' <jobid> -json
. The format fields extracted from job are the following:“job_name”
“stat”
“user”
“user_group”
“queue”
“proj_name”
“pids”
“exit_code”
“from_host”
“exec_host”
“submit_time”
“start_time”
“finish_time”
“nthreads”
“exec_home”
“exec_cwd”
“output_file”
“error_file”
Shown below is the output format and we retrieve the job records defined in RECORDS property
$ bjobs -o 'job_name stat user user_group queue proj_name pids exit_code from_host exec_host submit_time start_time finish_time nthreads exec_home exec_cwd output_file error_file' 58652 -json { "COMMAND":"bjobs", "JOBS":1, "RECORDS":[ { "JOB_NAME":"hold_job", "STAT":"PSUSP", "USER":"shahzebsiddiqui", "USER_GROUP":"GEN014ECPCI", "QUEUE":"batch", "PROJ_NAME":"GEN014ECPCI", "PIDS":"", "EXIT_CODE":"", "FROM_HOST":"login1", "EXEC_HOST":"", "SUBMIT_TIME":"May 28 12:45", "START_TIME":"", "FINISH_TIME":"", "NTHREADS":"", "EXEC_HOME":"", "EXEC_CWD":"", "OUTPUT_FILE":"hold_job.out", "ERROR_FILE":"hold_job.err" } ] }
- is_complete(self)¶
Check if Job is complete which is in
DONE
state. ReturnTrue
if there is a match otherwise returnFalse
- is_failed(self)¶
Check if Job failed. We return
True
if job is inEXIT
state otherwise returnFalse
- is_pending(self)¶
Check if Job is pending which is reported by LSF as
PEND
. ReturnTrue
if there is a match otherwise returnsFalse
- is_running(self)¶
Check if Job is running which is reported by LSF as
RUN
. ReturnTrue
if there is a match otherwise returnsFalse
- is_suspended(self)¶
Check if Job is in suspended state which could be in any of the following states: [
PSUSP
,USUSP
,SSUSP
]. We returnTrue
if job is in one of the states otherwise returnFalse
- output_file(self)¶
Return job output file
- poll(self)¶
Given a job id we poll the LSF Job by retrieving its job state, output file, error file and exit code. We run the following commands to retrieve following states
Job State:
bjobs -noheader -o 'stat' <JOBID>
Output File:
bjobs -noheader -o 'output_file' <JOBID>'
Error File:
bjobs -noheader -o 'error_file' <JOBID>'
Exit Code File:
bjobs -noheader -o 'EXIT_CODE' <JOBID>'
- buildtest.executors.lsf.logger¶