Configuring Executors

An executor is responsible for running the test and capture output/error file and return code. An executor can be local executor which runs tests on local machine or batch executor that can be modelled as partition/queue. A batch executor is responsible for dispatching job to scheduler and poll job until its finish and gather job results.

Executor Types

Local Executor

The executors is a JSON object, that defines one or more executors. The executors are grouped by their type followed by executor name. In this example we define two local executors bash, sh which will run tests on local machine.

   system:
     generic:
       executors:
         local:
           bash:
             description: submit jobs on local machine using bash shell
             shell: bash
           sh:
             description: submit jobs on local machine using sh shell
             shell: sh

The local executors are defined in section local where each executor must be unique name and they are referenced in buildspec using executor field in the following format:

executor: <system>.<type>.<name>

For instance, if a buildspec wants to reference the local executor bash from the generic cluster, you would specify the following in the buildspec:

executor: generic.local.bash

In our example configuration, we defined a bash executor as follows:

executors:
  # define local executors for running jobs locally
  local:
    bash:
      description: submit jobs on local machine using bash shell
      shell: bash

The local executors require the shell key which is one of supported shells in your system. On Linux/Mac system you can find all supported shells in file /etc/shells. Any buildspec that references this executor will submit job using bash shell.

You can pass options to shell which will get passed into each job submission. For instance if you want all bash scripts to run in login shell you can specify bash --login:

executors:
  local:
    login_bash:
      shell: bash --login

Then you can reference this executor as executor: generic.local.login_bash and your tests will be submitted via bash --login /path/to/test.sh.

Once you define your executors, you can query the executors via buildtest config executors list command.

Slurm Executors

If you have a Slurm cluster, you can define slurm executors in your configuration via slurm property.

Depending on your slurm configuration, you can submit jobs via qos or partition. Buildtest supports both methods and you can specify either qos or partition property.

In this example below, we will define a slurm executor named haswell_debug which will submit jobs to debug qos on the haswell partition as follows. The qos property is used to select slurm qos, the options property is used to pass additional options to sbatch command. In this example we are passing -C haswell to select haswell nodes. Any additional #SBATCH options are defined in buildspec for more details see batch scheduler support.

 executors:
   slurm:
     haswell_debug:
       qos: debug
       cluster: cori
       options: ["-C haswell"]

buildtest will detect slurm configuration and check qos, partition, cluster match with buildtest configuration. In addition, buildtest supports multi-cluster job submission and monitoring from remote cluster. This means if you specify cluster field buildtest will poll jobs using sacct with the cluster name as follows: sacct -M <cluster>.

You can configure your slurm executors to use slurm partitions instead of qos. This can be done via partition property. In this next example we define an executor name regular_hsw which will submit jobs to partition regular_hsw. The description field may be used for information purposes.

 executors:
   slurm:
     regular_hsw:
       partition: regular_hsw
       description: regular haswell queue

Buildtest will check if slurm partition is in up state before adding executor. If any partition is in down state, buildtest will mark the executor in invalid state and will be unusable.

To check availability of partition state, let’s say regular_hsw, buildtest will run the following command.

$ sinfo -p regular_hsw -h -O available
up

PBS Executors

Note

buildtest PBS support relies on job history set because buildtest needs to query job after completion using qstat -x. This can be configured using qmgr by setting set server job_history_enable=True. For more details see section 14.15.5.1 Enabling Job History in PBS 2021.1.3 Admin Guide

buildtest supports PBS scheduler which can be defined in the executors section. Shown below is an example configuration using one pbs executor named workq. The property queue: workq defines the name of PBS queue that is available in your system.

system:
  generic:
    hostnames: ['.*']

    moduletool: N/A
    executors:
      defaults:
         pollinterval: 10
         max_pend_time: 30
      pbs:
        workq:
          queue: workq
    compilers:
      compiler:
        gcc:
          default:
            cc: /usr/bin/gcc
            cxx: /usr/bin/g++
            fc: /usr/bin/gfortran

buildtest will detect the PBS queues in your system and determine if queues are active and enabled before submitting job to scheduler. buildtest will run qstat -Q -f -F json command to check for queue state which reports in JSON format and check if queue has the fields enabled: "True" or started: "True" set in the queue definition. If these values are not set, buildtest will raise an exception.

Shown below is an example with one queue workq that is enabled and started.

$ qstat -Q -f -F json
{
    "timestamp":1615924938,
    "pbs_version":"19.0.0",
    "pbs_server":"pbs",
    "Queue":{
        "workq":{
            "queue_type":"Execution",
            "total_jobs":0,
            "state_count":"Transit:0 Queued:0 Held:0 Waiting:0 Running:0 Exiting:0 Begun:0 ",
            "resources_assigned":{
                "mem":"0kb",
                "ncpus":0,
                "nodect":0
            },
            "hasnodes":"True",
            "enabled":"True",
            "started":"True"
        }
    }
}

PBS/Torque Executors

buildtest has support for Torque scheduler which can be defined in the executors section by using the torque property. Shown below is an example configuration that defines an executor name lbl using the queue name lbl-cluster

executors:
  torque:
    lbl:
      queue: lbl-cluster

We will run qstat -Qf to get queue details and check if queue is enabled and started before adding executor. If queue is not enabled or started, then buildtest will mark the executor as a invalid state and will be unusable.

Shown below is a sample output of qstat -Qf command on a PBS/Torque system which shows the queue configuration. Buildtest will parse this output to extract queue details and compare with executor configuration.

(buildtest) adaptive50@lbl-cluster:$ qstat -Qf
Queue: lbl-cluster
    queue_type = Execution
    total_jobs = 0
    state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0 Exiting:0 Comp
    lete:0
    resources_default.nodes = 1
    resources_default.walltime = 24:00:00
    mtime = 1711641211
    enabled = True
    started = True

LSF Executors

Buildtest supports LSF scheduler which can be defined in the executors section. Shown below is an example configuration that declares one executor named batch that uses the LSF queue named batch. The lsf property is used to define LSF executors, and the queue property is used to specify the LSF queue name.

executors:
  lsf:
    batch:
      queue: batch

buildtest will run bqueues -o 'queue_name status' -json command to get queue details to retrieve list of queues. If the queue property specifies an invalid queue name, buildtest will raise an exception.

$ bqueues -o 'queue_name status' -json
    {
      "COMMAND":"bqueues",
      "QUEUES":2,
      "RECORDS":[
        {
          "QUEUE_NAME":"batch",
          "STATUS":"Open:Active"
        },
        {
          "QUEUE_NAME":"test",
          "STATUS":"Open:Active"
        }
      ]
    }

Container Executor

Buildtest supports executor declaration for container based jobs. The container executor will run all associated test for the executor on the specified container image. Currently, we support docker, podman and singularity as the container platforms. We assume container runtime is installed on your system and is accessible in your $PATH.

Let’s take a look at the following container executor declaration. The top level keyword container is used to define the container executor which can follow any arbitrary name. We have defined two container executors named ubuntu and python that specify the container image and platform via image and platform property. The description is used for information purposes and does not impact buildtest in any way.

You can specify the full URI to the container image which is useful if you are using a custom registry

executors:
  container:
    ubuntu:
      image: ubuntu:20.04
      platform: docker
      description: submit jobs on ubuntu container
    python:
      image: python:3.11.0
      platform: docker
      description: submit jobs on python container

You can specify container runtime options via options and bind mount via mount property. Both properties are are string type, for instance let’s say you want to bind mount /tmp directory to /tmp

executors:
  container:
    ubuntu:
      image: ubuntu:20.04
      platform: docker
      mount: "/tmp:/tmp"
      options: "--user root"
      description: submit jobs on ubuntu container

Specifying Project Account

Batch jobs require project account to charge jobs and depending on your site this could be required in order to submit job. Some scheduler like Slurm can detect your default project account in that case you don’t need to specify on command line.

In your configuration file you can specify account property which will inherit this setting for all executors. You can specify account property within an executor which will override the default section.

In this example, we have two pbs executors testing and development. All pbs jobs will use the project account development because this is defined in defaults section however we can force all jobs using testing executor to charge jobs to qa_test.

 executors:
   defaults:
     pollinterval: 10
     maxpendtime: 90
     account: development
   pbs:
    testing:
      queue: test
      account: qa_test
    development:
      queue: dev

Alternately, you can override configuration setting via buildtest build --account command which will be applied for all batch jobs.

Poll Interval

The pollinterval field is used to poll jobs at set interval in seconds when job is active in queue. The poll interval can be configured on command line using buildtest build --pollinterval which overrides the configuration value.

Note

pollinterval and maxpendtime have no effect on local executors.

Max Pend Time

The maxpendtime is maximum time job can be pending within an executor, if it exceeds the limit buildtest will cancel the job.

The maxpendtime option can be overridden per executor level for example the section below overrides the default to 300 seconds:

    bigmem:
      description: bigmem jobs
      cluster: escori
      qos: bigmem
      maxpendtime: 300

The maxpendtime is used to cancel job only if job is pending in queue, it has no impact if job is running. buildtest starts a timer at job submission and every poll interval (pollinterval field) checks if job has exceeded maxpendtime only if job is pending. If job pendtime exceeds maxpendtime limit, buildtest will cancel job the job using the appropriate scheduler command like (scancel, bkill, qdel). Buildtest will remove cancelled jobs from poll queue, in addition cancelled jobs won’t be reported in test report.

For more details on maxpendtime click here.

Run command commands before executing test

You can configure an executor to run a set of commands when using an executor. You can use before_script property to specify a list of commands to run prior to running test.

The content of the before_script will be inserted in a shell script that is sourced by all tests.

  local:
    bash:
      description: submit jobs on local machine using bash shell
      shell: bash
      before_script: |
        today=$(date "+%D")
        echo "Today is $today, running test with user: $(whoami)"

buildtest will write a before_script.sh in $BUILDTEST_ROOT/var/executors directory that will contain contents of before_script. Shown below is a list of before_script.sh for all local executors.

$ find $BUILDTEST_ROOT/var/executor -type f
/Users/siddiq90/Documents/GitHubDesktop/buildtest/var/executor/generic.local.bash/before_script.sh
/Users/siddiq90/Documents/GitHubDesktop/buildtest/var/executor/generic.local.csh/before_script.sh
/Users/siddiq90/Documents/GitHubDesktop/buildtest/var/executor/generic.local.zsh/before_script.sh
/Users/siddiq90/Documents/GitHubDesktop/buildtest/var/executor/generic.local.sh/before_script.sh

If you run a test using this executor you will see the code is inserted from before_script.sh which is sourced for all given test.

$ cat  $BUILDTEST_ROOT/var/executor/generic.local.bash/before_script.sh
#!/bin/bash
today=$(date "+%D")
echo "Today is $today, running test with user: $(whoami)"

Disabling an executor

buildtest will run checks for every executor instance depending on the executor type, for instance local executors such as bash, sh, csh executor will be checked to see if shell is valid by checking the path. If shell doesn’t exist, buildtest will raise an error. You can circumvent this issue by disabling the executor via disable property. A disabled executor won’t serve any jobs which means any buildspec that reference the executor won’t create a test.

In this next example the executor zsh is disabled which can be used if you don’t have zsh on your system

 executors:
   local:
     zsh:
       shell: zsh
       disable: true

Loading Modules in Executors

You can configure executors to load modules, purge or restore from collection which will be run for all tests that use the executor. This can be achieved via module property that can be defined in the executor definition. In this next example, we create a bash executor that will purge modules and load gcc. The purge property is a boolean, if set to True we will run module purge before loading commands. The load property is a list of modules to module load.

 executors:
   local:
     bash:
       shell: bash
       module:
         purge: True
         load: ["gcc"]