buildtest.executors.lsf
¶
This module implements the LSFExecutor class responsible for submitting jobs to LSF Scheduler. This class is called in class BuildExecutor when initializing the executors.
Module Contents¶
Classes¶
The LSFExecutor class is responsible for submitting jobs to LSF Scheduler. |
|
This is a base class for holding job level data and common methods for used |
Attributes¶
- buildtest.executors.lsf.logger¶
- class buildtest.executors.lsf.LSFExecutor(name, settings, site_configs, max_pend_time=None)[source]¶
Bases:
buildtest.executors.base.BaseExecutor
The LSFExecutor class is responsible for submitting jobs to LSF Scheduler. The LSFExecutor performs the following steps
load: load lsf configuration from buildtest configuration file
dispatch: dispatch job to scheduler and acquire job ID
poll: wait for LSF jobs to finish
gather: Once job is complete, gather job data
Initiate a base executor, meaning we provide a name (also held by the BuildExecutor base that holds it) and the loaded dictionary of config opts to parse.
- Parameters
name (str) – name of executor
setting (dict) – setting for a given executor defined in configuration file
site_configs (buildtest.config.SiteConfiguration) – Instance of SiteConfiguration class
- type = lsf¶
- launcher_command(self)[source]¶
This command returns the launcher command and any options specified in configuration file. This is useful when generating the build script in the BuilderBase class
- dispatch(self, builder)[source]¶
This method is responsible for dispatching job to scheduler and extracting job ID by applying a
re.search
against output at onset of job submission. If job id is not retrieved due to job failure or unable to match regular expression we mark job incomplete by invokingbuildtest.buildsystem.base.BuilderBase.incomplete`()
method and return from method.If we have a valid job ID we invoke
buildtest.executors.lsf.LSFJob
class given the job id to poll job and store this intobuilder.job
attribute.- Parameters
builder (buildtest.buildsystem.base.BuilderBase) – An instance object of BuilderBase type
- poll(self, builder)[source]¶
Given a builder object we poll the job by invoking builder method
builder.job.poll()
return state of job. If job is suspended or pending we stop timer and check if timer exceeds max_pend_time value which could be defined in configuration file or passed via command line--max-pend-time
- Parameters
builder (buildtest.buildsystem.base.BuilderBase) – An instance object of BuilderBase type
- gather(self, builder)[source]¶
Gather Job detail after completion of job by invoking the builder method
builder.job.gather()
. We retrieve exit code, output file, error file and update builder metadata.- Parameters
builder (buildtest.buildsystem.base.BuilderBase) – An instance object of BuilderBase type
- class buildtest.executors.lsf.LSFJob(jobID)[source]¶
Bases:
buildtest.executors.job.Job
This is a base class for holding job level data and common methods for used for batch job submission.
- is_pending(self)[source]¶
Check if Job is pending which is reported by LSF as
PEND
. ReturnTrue
if there is a match otherwise returnsFalse
- is_running(self)[source]¶
Check if Job is running which is reported by LSF as
RUN
. ReturnTrue
if there is a match otherwise returnsFalse
- is_complete(self)[source]¶
Check if Job is complete which is in
DONE
state. ReturnTrue
if there is a match otherwise returnFalse
- is_suspended(self)[source]¶
Check if Job is in suspended state which could be in any of the following states: [
PSUSP
,USUSP
,SSUSP
]. We returnTrue
if job is in one of the states otherwise returnFalse
- is_failed(self)[source]¶
Check if Job failed. We return
True
if job is inEXIT
state otherwise returnFalse
- poll(self)[source]¶
Given a job id we poll the LSF Job by retrieving its job state, output file, error file and exit code. We run the following commands to retrieve following states
Job State:
bjobs -noheader -o 'stat' <JOBID>
Exit Code:
bjobs -noheader -o 'EXIT_CODE' <JOBID>'
- gather(self)[source]¶
This method will retrieve the output and error file for a given jobID using the following commands.
$ bjobs -noheader -o 'output_file' 70910 hold_job.out
$ bjobs -noheader -o 'error_file' 70910 hold_job.err
We will gather job record at onset of job completion by running
bjobs -o '<format1> <format2>' <jobid> -json
. The format fields extracted from job are the following:“job_name”
“stat”
“user”
“user_group”
“queue”
“proj_name”
“pids”
“exit_code”
“from_host”
“exec_host”
“submit_time”
“start_time”
“finish_time”
“nthreads”
“exec_home”
“exec_cwd”
“output_file”
“error_file”
Shown below is the output format and we retrieve the job records defined in RECORDS property
$ bjobs -o 'job_name stat user user_group queue proj_name pids exit_code from_host exec_host submit_time start_time finish_time nthreads exec_home exec_cwd output_file error_file' 58652 -json { "COMMAND":"bjobs", "JOBS":1, "RECORDS":[ { "JOB_NAME":"hold_job", "STAT":"PSUSP", "USER":"shahzebsiddiqui", "USER_GROUP":"GEN014ECPCI", "QUEUE":"batch", "PROJ_NAME":"GEN014ECPCI", "PIDS":"", "EXIT_CODE":"", "FROM_HOST":"login1", "EXEC_HOST":"", "SUBMIT_TIME":"May 28 12:45", "START_TIME":"", "FINISH_TIME":"", "NTHREADS":"", "EXEC_HOME":"", "EXEC_CWD":"", "OUTPUT_FILE":"hold_job.out", "ERROR_FILE":"hold_job.err" } ] }