buildtest.executors.lsf

This module implements the LSFExecutor class responsible for submitting jobs to LSF Scheduler. This class is called in class BuildExecutor when initializing the executors.

Module Contents

Classes

LSFExecutor

The LSFExecutor class is responsible for submitting jobs to LSF Scheduler.

LSFJob

This is a base class for holding job level data and common methods for used

Attributes

logger

class buildtest.executors.lsf.LSFExecutor(name, settings, site_configs, max_pend_time=None)

Bases: buildtest.executors.base.BaseExecutor

The LSFExecutor class is responsible for submitting jobs to LSF Scheduler. The LSFExecutor performs the following steps

  • load: load lsf configuration from buildtest configuration file

  • dispatch: dispatch job to scheduler and acquire job ID

  • poll: wait for LSF jobs to finish

  • gather: Once job is complete, gather job data

type = lsf
dispatch(self, builder)

This method is responsible for dispatching job to scheduler and extracting job ID by applying a re.search against output at onset of job submission. If job id is not retrieved due to job failure or unable to match regular expression we mark job incomplete by invoking builder.incomplete() method and return from method.

If we have a valid job ID we invoke LSFJob class given the job id to poll job and store this into builder.job attribute.

Parameters

builder (BuilderBase, required) – builder object

gather(self, builder)

Gather Job detail after completion of job by invoking the builder method builder.job.gather(). We retrieve exit code, output file, error file and update builder metadata.

Parameters

builder (BuilderBase, required) – builder object

launcher_command(self)

This command returns the launcher command and any options specified in configuration file. This is useful when generating the build script in the BuilderBase class

load(self)

Load the a LSF executor configuration from buildtest settings.

poll(self, builder)

Given a builder object we poll the job by invoking builder method builder.job.poll() return state of job. If job is suspended or pending we stop timer and check if timer exceeds max_pend_time value which could be defined in configuration file or passed via command line --max-pend-time

Parameters

builder (BuilderBase, required) – builder object

class buildtest.executors.lsf.LSFJob(jobID)

Bases: buildtest.executors.job.Job

This is a base class for holding job level data and common methods for used for batch job submission.

cancel(self)

Cancel LSF Job by running bkill <jobid>. This is called if job has exceeded max_pend_time limit during poll stage.

error_file(self)

Return job error file

exitcode(self)

Return job exit code

gather(self)

Gather Job record at onset of job completion by running bjobs -o '<format1> <format2>' <jobid> -json. The format fields extracted from job are the following:

  • “job_name”

  • “stat”

  • “user”

  • “user_group”

  • “queue”

  • “proj_name”

  • “pids”

  • “exit_code”

  • “from_host”

  • “exec_host”

  • “submit_time”

  • “start_time”

  • “finish_time”

  • “nthreads”

  • “exec_home”

  • “exec_cwd”

  • “output_file”

  • “error_file”

Shown below is the output format and we retrieve the job records defined in RECORDS property

$ bjobs -o 'job_name stat user user_group queue proj_name pids exit_code from_host exec_host submit_time start_time finish_time nthreads exec_home exec_cwd output_file error_file' 58652 -json
{
  "COMMAND":"bjobs",
  "JOBS":1,
  "RECORDS":[
    {
      "JOB_NAME":"hold_job",
      "STAT":"PSUSP",
      "USER":"shahzebsiddiqui",
      "USER_GROUP":"GEN014ECPCI",
      "QUEUE":"batch",
      "PROJ_NAME":"GEN014ECPCI",
      "PIDS":"",
      "EXIT_CODE":"",
      "FROM_HOST":"login1",
      "EXEC_HOST":"",
      "SUBMIT_TIME":"May 28 12:45",
      "START_TIME":"",
      "FINISH_TIME":"",
      "NTHREADS":"",
      "EXEC_HOME":"",
      "EXEC_CWD":"",
      "OUTPUT_FILE":"hold_job.out",
      "ERROR_FILE":"hold_job.err"
    }
  ]
}
is_complete(self)

Check if Job is complete which is in DONE state. Return True if there is a match otherwise return False

is_failed(self)

Check if Job failed. We return True if job is in EXIT state otherwise return False

is_pending(self)

Check if Job is pending which is reported by LSF as PEND. Return True if there is a match otherwise returns False

is_running(self)

Check if Job is running which is reported by LSF as RUN. Return True if there is a match otherwise returns False

is_suspended(self)

Check if Job is in suspended state which could be in any of the following states: [PSUSP, USUSP, SSUSP]. We return True if job is in one of the states otherwise return False

output_file(self)

Return job output file

poll(self)

Given a job id we poll the LSF Job by retrieving its job state, output file, error file and exit code. We run the following commands to retrieve following states

  • Job State: bjobs -noheader -o 'stat' <JOBID>

  • Output File: bjobs -noheader -o 'output_file' <JOBID>'

  • Error File: bjobs -noheader -o 'error_file' <JOBID>'

  • Exit Code File: bjobs -noheader -o 'EXIT_CODE' <JOBID>'

buildtest.executors.lsf.logger