buildtest.scheduler.detection

Module Contents

Classes

Scheduler

This is a base Scheduler class used for implementing common methods for

Slurm

The Slurm class implements common functions to query Slurm cluster

LSF

The LSF class checks for LSF binaries and returns a list of LSF queues

Cobalt

The Cobalt class checks for Cobalt binaries and gets a list of Cobalt queues

PBS

The PBS class checks for PBS binaries and gets a list of available queues

Torque

The Torque class is a subclass of PBS class and inherits all methods from PBS class

class buildtest.scheduler.detection.Scheduler[source]

This is a base Scheduler class used for implementing common methods for detecting Scheduler details. The subclass implements specific queries that are scheduler specific.

logger
binaries = []
queues()[source]
check_binaries(binaries)[source]

Check if binaries exist binary exist in $PATH

active()[source]

Returns True if buildtest is able to retrieve queues from Scheduler otherwises returns False

abstract get_queues()[source]

This method is implemented by subclass to return a list of queues for a given scheduler

class buildtest.scheduler.detection.Slurm[source]

Bases: Scheduler

The Slurm class implements common functions to query Slurm cluster including partitions, qos, cluster. We check existence of slurm binaries in $PATH and return if slurm cluster is in valid state.

binaries = ['sbatch', 'sacct', 'sacctmgr', 'sinfo', 'scancel', 'scontrol']
partitions()[source]
clusters()[source]
qos()[source]
run_command(query)[source]

Run a command and return output as list of lines

_get_partitions()[source]

Get list of all partitions slurm partitions using sinfo -a -h -O partitionname. The output is a list of queue names

$ sinfo -a -h -O partitionname
system
system_shared
debug_hsw
debug_knl
jupyter
_get_clusters()[source]

Get list of slurm clusters by running sacctmgr list cluster -P -n format=Cluster. The output is a list of slurm clusters something as follows

$ sacctmgr list cluster -P -n format=Cluster
cori
escori
_get_qos()[source]

Retrieve a list of all slurm qos by running sacctmgr list qos -P -n  format=Name. The output is a list of qos. Shown below is an example output

$ sacctmgr list qos -P -n  format=Name
normal
premium
low
serialize
scavenger
validate_partition(slurm_executor)[source]

Validate the partition for a given executor.

Parameters:

slurm_executor (dict) – The configuration of the executor.

Returns:

True if the partition is valid and in ‘up’ state, False otherwise.

Return type:

bool

validate_cluster(executor, slurm_executor)[source]

This method will validate a cluster for a given executor. If ‘cluster’ key is defined in slurm executor configuration we will check if cluster is valid, if so we return True otherwise we return False.

Parameters:
  • executor (str) – The name of the executor.

  • slurm_executor (dict) – The configuration of the executor.

validate_qos(executor, slurm_executor)[source]

This method will validate a qos for a given executor. If ‘qos’ key is defined in slurm executor configuration we will check if qos is valid, if so we return True otherwise we return False.

Parameters:
  • executor (str) – The name of the executor.

  • slurm_executor (dict) – The configuration of the executor.

class buildtest.scheduler.detection.LSF[source]

Bases: Scheduler

The LSF class checks for LSF binaries and returns a list of LSF queues

binaries = ['bsub', 'bqueues', 'bkill', 'bjobs']
get_queues()[source]

Return json dictionary of available LSF Queues and their queue states. The command we run is the following: bqueues -o 'queue_name status' -json which returns a JSON record of all queue details.

$ bqueues -o 'queue_name status' -json
    {
      "COMMAND":"bqueues",
      "QUEUES":2,
      "RECORDS":[
        {
          "QUEUE_NAME":"batch",
          "STATUS":"Open:Active"
        },
        {
          "QUEUE_NAME":"test",
          "STATUS":"Open:Active"
        }
      ]
    }
validate_queue(executor)[source]

This method will validate a LSF queue. We check if queue is available and in ‘Open:Active’ state. The input is a dictionary containing the LSF executor configuration. If queue is not found we return False.

Parameters:

executor (dict) – The dictionary containing the LSF executor configuration.

Returns:

True if queue is found and in ‘Open:Active’ state, False otherwise.

Return type:

bool

class buildtest.scheduler.detection.Cobalt[source]

Bases: Scheduler

The Cobalt class checks for Cobalt binaries and gets a list of Cobalt queues

binaries = ['qsub', 'qstat', 'qdel', 'nodelist', 'showres', 'partlist']
get_queues()[source]

Get all Cobalt queues by running qstat -Ql and parsing output

class buildtest.scheduler.detection.PBS[source]

Bases: Scheduler

The PBS class checks for PBS binaries and gets a list of available queues

binaries = ['qsub', 'qstat', 'qdel', 'qstart', 'qhold', 'qmgr']
check()[source]

Check if binaries exist in $PATH and run qsub --version to see output to determine if its OpenPBS scheduler. The return will be a boolean type where True indicates the check has passed.

Output of qsub --version from OpenPBS scheduler would be as follows, we will search for string pbs_version

[pbsuser@pbs tmp]$ qsub –version pbs_version = 19.0.0

Parameters:

binaries (list) – list of binaries to check for existence in $PATH

get_queues()[source]

Get queue configuration using qstat -Q -f -F json and retrieve a list of queues.

Shown below is an example output of qstat -Q -f -F json

$ qstat -Q -f -F json
 {
     "timestamp":1615924938,
     "pbs_version":"19.0.0",
     "pbs_server":"pbs",
     "Queue":{
         "workq":{
             "queue_type":"Execution",
             "total_jobs":0,
             "state_count":"Transit:0 Queued:0 Held:0 Waiting:0 Running:0 Exiting:0 Begun:0 ",
             "resources_assigned":{
                 "mem":"0kb",
                 "ncpus":0,
                 "nodect":0
             },
             "hasnodes":"True",
             "enabled":"True",
             "started":"True"
         }
     }
 }
validate_queue(queue_name)[source]

Validate a PBS queue. Return True if queue exists and is enabled and started, False otherwise.

Parameters:

queue_name (str) – The name of the queue to validate.

class buildtest.scheduler.detection.Torque[source]

Bases: PBS

The Torque class is a subclass of PBS class and inherits all methods from PBS class

check()[source]

Check if binaries exist in $PATH and run qsub --version to see output if its Torque Scheduler. The return will be a boolean type where True indicates the check has passed.

Output from qsub --version from Torque scheduler would be as follows, we will search for Commit: in output to distinguish Torque from OpenPBS

$ qsub --version
Version: 7.0.1
Commit: b405f8c22d41d29cbf9b9016bc1146bf4559e895
Parameters:

binaries (list) – list of binaries to check for existence in $PATH

get_queues()[source]

Get queue configuration using ‘qstat -Qf’ and parse the output into a JSON dictionary. The output of this command will be as follows

$ qstat -Qf
 Queue: lbl-cluster
     queue_type = Execution
     total_jobs = 0
     state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0 Exiting:0 Complete:0
     resources_default.nodes = 1
     resources_default.walltime = 24:00:00
     mtime = 1711400391
     enabled = True
     started = True
validate_queue(torque_executor)[source]

This method will validate queue for a given executor. We will check if queue is available and check queue configuration to see if queue is enabled and started properly.