Batch Scheduler Support

buildtest batch scheduler support is an experimental feature, currently buildtest supports Slurm and LSF Executor. In order for buildtest to submit jobs to scheduler, you must define a slurm or lsf executor.

Slurm Executor

The SlurmExecutor class is responsible for managing slurm jobs which will perform the following action

  1. Check slurm binary sbatch and sacct.

  2. Dispatch Job and acquire job ID using sacct.

  3. Poll all slurm jobs until all have finished

  4. Gather Job results once job is complete via sacct.

buildtest will dispatch all jobs and poll all jobs in a while (True) until all jobs are complete. If job is in [PENDING | RUNNING ] then buildtest will keep polling at a set interval. Once job is not in PENDING or RUNNING stage, buildtest will gather job results and wait until all jobs have finished.

In order to use a slurm scheduler, you must define some slurm executors and reference them via executor property. In this example we have a slurm executor slurm.debug, in addition we can specify #SBATCH directives using sbatch field. The sbatch field is a list of string types, buildtest will insert #SBATCH directive in front of each value.

Shown below is an example buildspec:

version: "1.0"
buildspecs:
  slurm_metadata:
    description: Get metadata from compute node when submitting job
    type: script
    executor: slurm.debug
    sbatch:
      - "-t 00:05"
      - "-C haswell"
      - "-N 1"
    run: |
      export SLURM_JOB_NAME="firstjob"
      echo "jobname:" $SLURM_JOB_NAME
      echo "slurmdb host:" $SLURMD_NODENAME
      echo "pid:" $SLURM_TASK_PID
      echo "submit host:" $SLURM_SUBMIT_HOST
      echo "nodeid:" $SLURM_NODEID
      echo "partition:" $SLURM_JOB_PARTITION

buildtest will add the #SBATCH directives at top of script followed by content in the run section. Shown below is the example test content

#!/bin/bash
#SBATCH -t 00:05
#SBATCH -C haswell
#SBATCH -N 1
export SLURM_JOB_NAME="firstjob"
echo "jobname:" $SLURM_JOB_NAME
echo "slurmdb host:" $SLURMD_NODENAME
echo "pid:" $SLURM_TASK_PID
echo "submit host:" $SLURM_SUBMIT_HOST
echo "nodeid:" $SLURM_NODEID
echo "partition:" $SLURM_JOB_PARTITION

The slurm.debug executor in our settings.yml is defined as follows:

slurm:
  debug:
    description: jobs for debug qos
    qos: debug
    cluster: cori

With this setting, any buildspec test that use slurm.debug executor will result in the following launch option: sbatch --qos debug --clusters=cori </path/to/script.sh>.

Unlike the LocalExecutor, the Run Stage, will dispatch the slurm job and poll until job is completed. Once job is complete, it will gather the results and terminate. In Run Stage, buildtest will mark test status as N/A because job is submitted to scheduler and pending in queue. In order to get job result, we need to wait until job is complete then we gather results and determine test state. buildtest keeps track of all buildspecs, testscripts to be run and their results. A test using LocalExecutor will run test in Run Stage and returncode will be retrieved and status can be calculated immediately. For Slurm Jobs, buildtest dispatches the job and process next job. buildtest will show output of all tests after Polling Stage with test results of all tests. A slurm job with exit code 0 will be marked with status PASS.

Shown below is an example build for this test

$ buildtest build -b metadata.yml
Paths:
__________
Prefix: /global/u1/s/siddiq90/cache
Buildspec Search Path: ['/global/homes/s/siddiq90/.buildtest/site']
Test Directory: /global/u1/s/siddiq90/cache/tests

+-------------------------------+
| Stage: Discovered Buildspecs  |
+-------------------------------+

/global/u1/s/siddiq90/buildtest-cori/slurm/valid_jobs/metadata.yml

+----------------------+
| Stage: Building Test |
+----------------------+

 Name           | Schema File             | Test Path                                                    | Buildspec
----------------+-------------------------+--------------------------------------------------------------+--------------------------------------------------------------------
 slurm_metadata | script-v1.0.schema.json | /global/u1/s/siddiq90/cache/tests/metadata/slurm_metadata.sh | /global/u1/s/siddiq90/buildtest-cori/slurm/valid_jobs/metadata.yml

+----------------------+
| Stage: Running Test  |
+----------------------+

[slurm_metadata] job dispatched to scheduler
[slurm_metadata] acquiring job id in 2 seconds
 name           | executor    | status   |   returncode | testpath
----------------+-------------+----------+--------------+--------------------------------------------------------------
 slurm_metadata | slurm.debug | N/A      |            0 | /global/u1/s/siddiq90/cache/tests/metadata/slurm_metadata.sh


Polling Jobs in 10 seconds
________________________________________
[slurm_metadata]: JobID 32740760 in PENDING state


Polling Jobs in 10 seconds
________________________________________
[slurm_metadata]: JobID 32740760 in COMPLETED state


Polling Jobs in 10 seconds
________________________________________

+---------------------------------------------+
| Stage: Final Results after Polling all Jobs |
+---------------------------------------------+

 name           | executor    | status   |   returncode | testpath
----------------+-------------+----------+--------------+--------------------------------------------------------------
 slurm_metadata | slurm.debug | PASS     |            0 | /global/u1/s/siddiq90/cache/tests/metadata/slurm_metadata.sh

+----------------------+
| Stage: Test Summary  |
+----------------------+

Executed 1 tests
Passed Tests: 1/1 Percentage: 100.000%
Failed Tests: 0/1 Percentage: 0.000%

The SlurmExecutor class is responsible for processing slurm job that may include: dispatch, poll, gather, or cancel job. The SlurmExecutor will gather job metrics via sacct using the following format fields:

  • Account

  • AllocNodes

  • AllocTRES

  • ConsumedEnergyRaw

  • CPUTimeRaw

  • End

  • ExitCode

  • “JobID

  • JobName

  • NCPUS

  • NNodes

  • QOS

  • ReqGRES

  • ReqMem

  • ReqNodes

  • ReqTRES

  • Start

  • State

  • Submit

  • UID

  • User

  • WorkDir

For a complete list of format fields see sacct -e. For now, we support only these fields of interest for reporting purpose.

buildtest can check status based on Slurm Job State, this is defined by State field in sacct. In next example, we introduce field slurm_job_state which is part of status field. This field expects one of the following values: [COMPLETED, FAILED, OUT_OF_MEMORY, TIMEOUT ] This is an example of simulating fail job by expecting a return code of 1 with job state of FAILED.

version: "1.0"
buildspecs:
  wall_timeout:
    type: script
    executor: slurm.debug
    sbatch: [ "-t 2", "-C haswell", "-n 1"]
    run: exit 1
    status:
      slurm_job_state: "FAILED"

If we run this test, buildtest will mark this test as PASS because the slurm job state matches with expected result even though returncode is 1.

+---------------------------------------------+
| Stage: Final Results after Polling all Jobs |
+---------------------------------------------+

 name         | executor    | status   |   returncode | testpath
--------------+-------------+----------+--------------+---------------------------------------------------------
 wall_timeout | slurm.debug | PASS     |            1 | /global/u1/s/siddiq90/cache/tests/exit1/wall_timeout.sh

If you examine the logfile buildtest.log you will see an entry of sacct command run to gather results followed by list of field and value output:

2020-07-22 18:20:48,170 [base.py:587 - gather() ] - [DEBUG] Gather slurm job data by running: sacct -j 32741040 -X -n -P -o Account,AllocNodes,AllocTRES,ConsumedEnergyRaw,CPUTimeRaw,End,ExitCode,JobID,JobName,NCPUS,NNodes,QOS,ReqGRES,ReqMem,ReqNodes,ReqTRES,Start,State,Submit,UID,User,WorkDir -M cori
...
2020-07-22 18:20:48,405 [base.py:598 - gather() ] - [DEBUG] field: State   value: FAILED

LSF Executor (Experimental)

The LSFExecutor is responsible for submitting jobs to LSF scheduler. The LSFExecutor behaves similar to SlurmExecutor with the five stages implemented as class methods:

  • Check: check lsf binaries (bsub, bjobs)

  • Load: load lsf executor from buildtest configuration config.yml

  • Dispatch: Dispatch job using bsub and retrieve JobID

  • Poll: Poll job using bjobs to retrieve job state

  • Gather: Retrieve job results once job is finished

The bsub key works similar to sbatch key which allows one to specify #BSUB directive into job script. This example will use the lsf.batch executor with executor name batch defined in buildtest configuration.

version: "1.0"
buildspecs:
  hostname:
    type: script
    executor: lsf.batch
    bsub: [ "-W 10",  "-nnodes 1"]

    run: jsrun hostname

The LSFExecutor poll method will retrieve job state using bjobs -noheader -o 'stat' <JOBID>. The LSFExecutor will poll job so long as they are in PEND or RUN state. Once job is not in any of the two states, LSFExecutor will proceed to gather stage and acquire job results.

The LSFExecutor gather method will retrieve the following format fields using bjobs

  • job_name

  • stat

  • user

  • user_group

  • queue

  • proj_name

  • pids

  • exit_code

  • from_host

  • exec_host

  • submit_time

  • start_time

  • finish_time

  • nthreads

  • exec_home

  • exec_cwd

  • output_file

  • error_file

Scheduler Agnostic Configuration

The batch field can be used for specifying scheduler agnostic configuration based on your scheduler. buildtest will translate the input into the appropriate script directive supported by the scheduler. Shown below is a translation table for the batch field

Batch Translation Table

Field

Slurm

LSF

account

–account

-P

begin

–begin

-b

cpucount

–ntasks

-n

email-address

–mail-user

-u

exclusive

–exclusive=user

-x

memory

–mem

-M

network

–network

-network

nodecount

–nodes

-nnodes

qos

–qos

N/A

queue

–partition

-q

tasks-per-core

–ntasks-per-core

N/A

tasks-per-node

–ntasks-per-node

N/A

tasks-per-socket

–ntasks-per-socket

N/A

timelimit

–time

-W

In this example, we rewrite the LSF buildspec to use batch instead of bsub field:

version: "1.0"
buildspecs:
  hostname:
    type: script
    executor: lsf.batch
    batch:
      timelimit: "10"
      nodecount: "1"
    run: jsrun hostname

buildtest will translate the batch field into #BSUB directive as you can see in the generated test:

#!/usr/bin/bash
#BSUB -W 10
#BSUB -nnodes 1
source /autofs/nccsopen-svm1_home/shahzebsiddiqui/buildtest/var/executors/lsf.batch/before_script.sh
jsrun hostname

In next example we use batch field with on a Slurm cluster that submits a sleep job as follows:

version: "1.0"
buildspecs:
  sleep:
    type: script
    executor: slurm.normal
    description: sleep 2 seconds
    tags: [tutorials]
    batch:
      nodecount: "1"
      cpucount: "1"
      timelimit: "5"
      memory: "5MB"
      exclusive: true

    vars:
      SLEEP_TIME: 2
    run: sleep $SLEEP_TIME

The exclusive field is used for getting exclusive node access, this is a boolean instead of string. You can instruct buildtest to stop after build phase by using --stage=build which will build the script but not run it. If we inspect the generated script we see the following:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --time=5
#SBATCH --mem=5MB
#SBATCH --exclusive=user
source /home1/06908/sms1990/buildtest/var/executors/slurm.normal/before_script.sh
SLEEP_TIME=2
sleep $SLEEP_TIME

You may leverage batch with sbatch or bsub field to specify your job directives. If a particular field is not available in batch property then utilize sbatch or bsub field to fill in rest of the arguments.

Jobs exceeds max_pend_time

Recall from Configuring buildtest that max_pend_time will cancel jobs if job exceed timelimit. buildtest will start a timer for each job right after job submission and keep track of time duration, if job is pending then job will be cancelled. To demonstrate, here is an example of two buildspecs submitted to scheduler and notice job shared_qos_haswell_hostname was cancelled during after max_pend_time of 10 sec. Note that cancelled job is not reported in final output nor updated in report hence it won’t be present in the report (buildtest report).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
$ buildtest build -b queues/shared.yml -b queues/xfer.yml

+-------------------------------+
| Stage: Discovering Buildspecs |
+-------------------------------+


Discovered Buildspecs:

/global/u1/s/siddiq90/buildtest-cori/queues/xfer.yml
/global/u1/s/siddiq90/buildtest-cori/queues/shared.yml

+---------------------------+
| Stage: Parsing Buildspecs |
+---------------------------+

 schemafile              | validstate   | buildspec
-------------------------+--------------+--------------------------------------------------------
 script-v1.0.schema.json | True         | /global/u1/s/siddiq90/buildtest-cori/queues/xfer.yml
 script-v1.0.schema.json | True         | /global/u1/s/siddiq90/buildtest-cori/queues/shared.yml

+----------------------+
| Stage: Building Test |
+----------------------+

 name                        | id       | type   | executor     | tags                  | testpath
-----------------------------+----------+--------+--------------+-----------------------+---------------------------------------------------------------------------------------------------------------
 xfer_qos_hostname           | d0043be3 | script | slurm.xfer   | ['queues']            | /global/u1/s/siddiq90/buildtest/var/tests/slurm.xfer/xfer/xfer_qos_hostname/1/stage/generate.sh
 shared_qos_haswell_hostname | 9d3723ac | script | slurm.shared | ['queues', 'reframe'] | /global/u1/s/siddiq90/buildtest/var/tests/slurm.shared/shared/shared_qos_haswell_hostname/1/stage/generate.sh

+----------------------+
| Stage: Running Test  |
+----------------------+

[xfer_qos_hostname] JobID: 1089664 dispatched to scheduler
[shared_qos_haswell_hostname] JobID: 35189528 dispatched to scheduler
 name                        | id       | executor     | status   |   returncode | testpath
-----------------------------+----------+--------------+----------+--------------+---------------------------------------------------------------------------------------------------------------
 xfer_qos_hostname           | d0043be3 | slurm.xfer   | N/A      |            0 | /global/u1/s/siddiq90/buildtest/var/tests/slurm.xfer/xfer/xfer_qos_hostname/1/stage/generate.sh
 shared_qos_haswell_hostname | 9d3723ac | slurm.shared | N/A      |            0 | /global/u1/s/siddiq90/buildtest/var/tests/slurm.shared/shared/shared_qos_haswell_hostname/1/stage/generate.sh


Polling Jobs in 10 seconds
________________________________________
[xfer_qos_hostname]: JobID 1089664 in COMPLETED state
[shared_qos_haswell_hostname]: JobID 35189528 in PENDING state

Polling Jobs in 10 seconds
________________________________________
[shared_qos_haswell_hostname]: JobID 35189528 in PENDING state
Cancelling Job: shared_qos_haswell_hostname running command: scancel 35189528
Cancelling Job because duration time: 20.573901 sec exceeds max pend time: 10 sec


Polling Jobs in 10 seconds
________________________________________
Cancelled Tests:
shared_qos_haswell_hostname

+---------------------------------------------+
| Stage: Final Results after Polling all Jobs |
+---------------------------------------------+

 name              | id       | executor   | status   |   returncode | testpath
-------------------+----------+------------+----------+--------------+-------------------------------------------------------------------------------------------------
 xfer_qos_hostname | d0043be3 | slurm.xfer | PASS     |            0 | /global/u1/s/siddiq90/buildtest/var/tests/slurm.xfer/xfer/xfer_qos_hostname/1/stage/generate.sh

+----------------------+
| Stage: Test Summary  |
+----------------------+

Executed 1 tests
Passed Tests: 1/1 Percentage: 100.000%
Failed Tests: 0/1 Percentage: 0.000%