buildtest.executors.slurm

This module implements the SlurmExecutor class responsible for submitting jobs to Slurm Scheduler. This class is called in class BuildExecutor when initializing the executors.

Module Contents

Classes

SlurmExecutor(name, settings, config_opts)

The SlurmExecutor class is responsible for submitting jobs to Slurm Scheduler.

class buildtest.executors.slurm.SlurmExecutor(name, settings, config_opts)

Bases: buildtest.executors.base.BaseExecutor

The SlurmExecutor class is responsible for submitting jobs to Slurm Scheduler. The SlurmExecutor performs the following steps

check: check if slurm partition is available for accepting jobs. load: load slurm configuration from buildtest configuration file dispatch: dispatch job to scheduler and acquire job ID poll: wait for Slurm jobs to finish gather: Once job is complete, gather job data

job_state
poll_cmd = sacct
sacct_fields = ['Account', 'AllocNodes', 'AllocTRES', 'ConsumedEnergyRaw', 'CPUTimeRaw', 'End', 'ExitCode', 'JobID', 'JobName', 'NCPUS', 'NNodes', 'QOS', 'ReqGRES', 'ReqMem', 'ReqNodes', 'ReqTRES', 'Start', 'State', 'Submit', 'UID', 'User', 'WorkDir']
steps = ['dispatch', 'poll', 'gather', 'close']
type = slurm
cancel(self)

Cancel slurm job, this operation is performed if job exceeds pending or runtime.

check(self)

Check slurm binary is available before running tests. This will check the launcher (sbatch) and sacct are available. If qos, partition, and cluster key defined we check if its a valid entity in slurm configuration. For partition, we also check if its in the up state before dispatching jobs. This method will raise an exception of type SystemExit if any checks fail.

dispatch(self)

This method is responsible for dispatching job to slurm scheduler.

gather(self)

Gather Slurm detail after job completion

load(self)

Load the a slurm executor configuration from buildtest settings.

poll(self)

This method will poll for job each interval specified by time interval until job finishes. We use sacct to poll for job id and sleep for given time interval until trying again. The command to be run is sacct -j <jobid> -o State -n -X -P