buildtest.scheduler.lsf

Module Contents

Classes

LSFJob

This is a base class for holding job level data and common methods for used

Attributes

logger

buildtest.scheduler.lsf.logger
class buildtest.scheduler.lsf.LSFJob(jobID)[source]

Bases: buildtest.scheduler.job.Job

This is a base class for holding job level data and common methods for used for batch job submission.

is_pending()[source]

Check if Job is pending which is reported by LSF as PEND. Return True if there is a match otherwise returns False

is_running()[source]

Check if Job is running which is reported by LSF as RUN. Return True if there is a match otherwise returns False

is_complete()[source]

Check if Job is complete which is in DONE state. Return True if there is a match otherwise return False

is_suspended()[source]

Check if Job is in suspended state which could be in any of the following states: [PSUSP, USUSP, SSUSP]. We return True if job is in one of the states otherwise return False

is_failed()[source]

Check if Job failed. We return True if job is in EXIT state otherwise return False

poll()[source]

Given a job id we poll the LSF Job by retrieving its job state, output file, error file and exit code. We run the following commands to retrieve following states

  • Job State: bjobs -noheader -o 'stat' <JOBID>

  • Exit Code: bjobs -noheader -o 'EXIT_CODE' <JOBID>'

get_output_and_error_files()[source]

This method will extract output and error file for a given jobID by running the following commands: bjobs -noheader -o 'output_file' <JOBID> and bjobs -noheader -o 'error_file' <JOBID>

$ bjobs -noheader -o 'output_file' 70910
hold_job.out
$ bjobs -noheader -o 'error_file' 70910
hold_job.err
retrieve_jobdata()[source]

We will gather job record at onset of job completion by running bjobs -o '<format1> <format2>' <jobid> -json. T

Shown below is the output format and we retrieve the job records defined in RECORDS property

$ bjobs -o 'job_name stat user user_group queue proj_name pids exit_code from_host exec_host submit_time start_time finish_time nthreads exec_home exec_cwd output_file error_file' 58652 -json
{
  "COMMAND":"bjobs",
  "JOBS":1,
  "RECORDS":[
    {
      "JOB_NAME":"hold_job",
      "STAT":"PSUSP",
      "USER":"shahzebsiddiqui",
      "USER_GROUP":"GEN014ECPCI",
      "QUEUE":"batch",
      "PROJ_NAME":"GEN014ECPCI",
      "PIDS":"",
      "EXIT_CODE":"",
      "FROM_HOST":"login1",
      "EXEC_HOST":"",
      "SUBMIT_TIME":"May 28 12:45",
      "START_TIME":"",
      "FINISH_TIME":"",
      "NTHREADS":"",
      "EXEC_HOME":"",
      "EXEC_CWD":"",
      "OUTPUT_FILE":"hold_job.out",
      "ERROR_FILE":"hold_job.err"
    }
  ]
}
cancel()[source]

Cancel LSF Job by running bkill <jobid>. This method is called if job pending time exceeds maxpendtime limit during poll stage.