Site Examples¶
NERSC¶
NERSC provides High Performance Computing system to support research in the Office of Science program offices. Currently, NERSC provides two HPC systems including Perlmutter and Cori. In example below we see the configuration for Cori and Perlmutter. Note that we can define a single configuration for both systems. Perlmutter is using Lmod while Cori is running environment-modules. We define Local executors and Slurm executors for each system which are mapped to qos provided by our Slurm cluster.
In-order to use bigmem
, xfer
,
or gpu
qos at Cori, we need to specify escori cluster (i.e sbatch --clusters=escori
).
NERSC buildtest configuration
system:
gerty:
description: Test System for Cori
hostnames:
- gert01.nersc.gov
moduletool: environment-modules
buildspecs:
rebuild: False
count: 15
format: "name,description"
terse: False
report:
count: 25
terse: False
format: "name,id,state,runtime,returncode"
latest: True
oldest: False
executors:
local:
bash:
description: submit jobs on local machine using bash shell
shell: bash
sh:
description: submit jobs on local machine using sh shell
shell: sh
csh:
description: submit jobs on local machine using csh shell
shell: csh
compilers:
compiler:
gcc:
builtin_gcc:
cc: /usr/bin/gcc
cxx: /usr/bin/g++
fc: /usr/bin/gfortran
cdash:
url: https://my.cdash.org
project: buildtest-nersc
site: gerty
cori:
hostnames:
- cori*
description: Cray XC system based on Intel Haswell and KNL nodes
moduletool: environment-modules
buildspecs:
rebuild: False
count: 15
format: "name,description"
terse: False
report:
count: 25
terse: False
format: "name,id,state,runtime,returncode"
latest: True
oldest: False
cdash:
url: https://my.cdash.org
project: buildtest-nersc
site: cori
executors:
defaults:
pollinterval: 30
maxpendtime: 86400
local:
bash:
description: submit jobs on local machine using bash shell
shell: bash
sh:
description: submit jobs on local machine using sh shell
shell: sh
csh:
description: submit jobs on local machine using csh shell
shell: csh
python:
description: submit jobs on local machine using python shell
shell: python
slurm:
haswell_debug:
qos: debug
cluster: cori
options:
- -C haswell
description: debug queue on Haswell partition
haswell_shared:
qos: shared
cluster: cori
options:
- -C haswell
description: shared queue on Haswell partition
haswell_regular:
qos: regular
cluster: cori
options:
- -C haswell
description: normal queue on Haswell partition
haswell_premium:
qos: premium
cluster: cori
options:
- -C haswell
description: premium queue on Haswell partition
haswell_flex:
qos: flex
cluster: cori
options:
- -C haswell
description: flex queue on Haswell partition
knl_flex:
description: overrun queue on KNL partition
qos: overrun
cluster: cori
options:
- -C knl
bigmem:
description: bigmem jobs
cluster: escori
qos: bigmem
xfer:
description: xfer qos jobs
qos: xfer
cluster: escori
options:
- -C haswell
compile:
description: compile qos jobs
qos: compile
cluster: escori
options:
- -N 1
knl_debug:
qos: debug
cluster: cori
options:
- -C knl,quad,cache
description: debug queue on KNL partition
knl_regular:
qos: normal
cluster: cori
options:
- -C knl,quad,cache
description: normal queue on KNL partition
knl_premium:
qos: premium
cluster: cori
options:
- -C knl,quad,cache
description: premium queue on KNL partition
knl_low:
qos: low
cluster: cori
options:
- -C knl,quad,cache
description: low queue on KNL partition
knl_overrun:
description: overrun queue on KNL partition
qos: overrun
cluster: cori
options:
- -C knl
- --time-min=01:00:00
gpu:
description: submit jobs to GPU partition
options:
- -C gpu
cluster: escori
compilers:
purge: false
enable_prgenv: true
prgenv_modules:
gcc: PrgEnv-gnu
cray: PrgEnv-cray
intel: PrgEnv-intel
find:
gcc: ^(gcc)
cray: ^(craype/)
intel: ^intel
cuda: ^(cuda/)
upcxx: ^(upcxx/)
compiler:
gcc:
builtin_gcc:
cc: /usr/bin/gcc
fc: /usr/bin/gfortran
cxx: /usr/bin/g++
gcc/7.3.0:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-gnu
- gcc/7.3.0
purge: false
gcc/8.1.0:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-gnu
- gcc/8.1.0
purge: false
gcc/8.3.0:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-gnu
- gcc/8.3.0
purge: false
gcc/10.3.0:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-gnu
- gcc/10.3.0
purge: false
gcc/11.2.0:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-gnu
- gcc/11.2.0
purge: false
cray:
craype/2.6.2:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-cray
- craype/2.6.2
purge: false
craype/2.7.10:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-cray
- craype/2.7.10
purge: false
intel:
intel/19.0.3.199:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-intel
- intel/19.0.3.199
purge: false
intel/19.1.2.254:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-intel
- intel/19.1.2.254
purge: false
intel/19.1.0.166:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-intel
- intel/19.1.0.166
purge: false
intel/19.1.1.217:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-intel
- intel/19.1.1.217
purge: false
intel/19.1.2.275:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-intel
- intel/19.1.2.275
purge: false
intel/19.1.3.304:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-intel
- intel/19.1.3.304
purge: false
upcxx:
upcxx/2021.9.0:
cc: upcxx
cxx: upcxx
fc: None
module:
load:
- upcxx/2021.9.0
purge: false
upcxx/2022.3.0:
cc: upcxx
cxx: upcxx
fc: None
module:
load:
- upcxx/2022.3.0
purge: false
upcxx/2022.9.0:
cc: upcxx
cxx: upcxx
fc: None
module:
load:
- upcxx/2022.9.0
purge: false
upcxx/bleeding-edge:
cc: upcxx
cxx: upcxx
fc: None
module:
load:
- upcxx/bleeding-edge
purge: false
upcxx/nightly:
cc: upcxx
cxx: upcxx
fc: None
module:
load:
- upcxx/nightly
purge: false
muller:
description: Muller is TDS system for Perlmutter
hostnames:
- login01
- login02
moduletool: lmod
buildspecs:
rebuild: False
count: 15
format: "name,description"
terse: False
report:
count: 25
terse: False
format: "name,id,state,runtime,returncode"
latest: True
oldest: False
executors:
defaults:
pollinterval: 30
maxpendtime: 86400
local:
bash:
description: submit jobs on local machine using bash shell
shell: bash
sh:
description: submit jobs on local machine using sh shell
shell: sh
csh:
description: submit jobs on local machine using csh shell
shell: csh
zsh:
description: submit jobs on local machine using zsh shell
shell: zsh
python:
description: submit jobs on local machine using python shell
shell: python
slurm:
regular:
qos: regular
debug:
qos: debug
xfer:
qos: xfer
preempt:
qos: preempt
compilers:
purge: false
enable_prgenv: true
prgenv_modules:
gcc: PrgEnv-gnu
cray: PrgEnv-cray
nvhpc: PrgEnv-nvidia
find:
gcc: ^(gcc)
cray: ^(cce)
nvhpc: ^(nvhpc)
compiler:
gcc:
builtin_gcc:
cc: /usr/bin/gcc
cxx: /usr/bin/g++
fc: /usr/bin/gfortran
gcc/11.2.0:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-gnu
- gcc/11.2.0
purge: false
gcc/10.3.0:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-gnu
- gcc/10.3.0
purge: false
cray:
cce/14.0.1:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-cray
- cce/14.0.1
purge: false
cce/13.0.2:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-cray
- cce/13.0.2
purge: false
nvhpc:
nvhpc/21.11:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-nvidia
- nvhpc/21.11
purge: false
nvhpc/22.7:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-nvidia
- nvhpc/22.7
purge: false
nvhpc/21.9:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-nvidia
- nvhpc/21.9
purge: false
nvhpc/22.5:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-nvidia
- nvhpc/22.5
purge: false
cdash:
url: https://my.cdash.org
project: buildtest-nersc
site: muller
perlmutter:
description: Cray Shasta system with AMD CPU and NVIDIA A100 GPUs
hostnames:
- login[01-40]
moduletool: lmod
buildspecs:
rebuild: False
count: 15
format: "name,description"
terse: False
report:
count: 25
terse: False
format: "name,id,state,runtime,returncode"
latest: True
oldest: False
executors:
defaults:
pollinterval: 30
maxpendtime: 86400
local:
bash:
description: submit jobs on local machine using bash shell
shell: bash
sh:
description: submit jobs on local machine using sh shell
shell: sh
csh:
description: submit jobs on local machine using csh shell
shell: csh
zsh:
description: submit jobs on local machine using zsh shell
shell: zsh
python:
description: submit jobs on local machine using python shell
shell: python
slurm:
regular:
qos: regular
debug:
qos: debug
xfer:
qos: xfer
preempt:
qos: preempt
compilers:
purge: false
enable_prgenv: true
prgenv_modules:
gcc: PrgEnv-gnu
cray: PrgEnv-cray
nvhpc: PrgEnv-nvidia
find:
gcc: ^(gcc)
cray: ^(cce)
nvhpc: ^(nvhpc)
compiler:
gcc:
builtin_gcc:
cc: /usr/bin/gcc
cxx: /usr/bin/g++
fc: /usr/bin/gfortran
gcc/11.2.0:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-gnu
- gcc/11.2.0
purge: false
gcc/10.3.0:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-gnu
- gcc/10.3.0
purge: false
cray:
cce/14.0.1:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-cray
- cce/14.0.1
purge: false
cce/13.0.2:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-cray
- cce/13.0.2
purge: false
cce-mixed/13.0.2:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-cray
- cce-mixed/13.0.2
purge: false
cce-mixed/14.0.1:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-cray
- cce-mixed/14.0.1
purge: false
nvhpc:
nvhpc/21.11:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-nvidia
- nvhpc/21.11
purge: false
nvhpc/21.9:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-nvidia
- nvhpc/21.9
purge: false
nvhpc/22.7:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-nvidia
- nvhpc/22.7
purge: false
nvhpc/22.5:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-nvidia
- nvhpc/22.5
purge: false
nvhpc-mixed/21.11:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-nvidia
- nvhpc-mixed/21.11
purge: false
nvhpc-mixed/22.5:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-nvidia
- nvhpc-mixed/22.5
purge: false
nvhpc-mixed/21.9:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-nvidia
- nvhpc-mixed/21.9
purge: false
nvhpc-mixed/22.7:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-nvidia
- nvhpc-mixed/22.7
purge: false
cdash:
url: https://my.cdash.org
project: buildtest-nersc
site: perlmutter
Ascent @ OLCF¶
Ascent is a training
system for Summit at OLCF, which is using a IBM Load Sharing
Facility (LSF) as their batch scheduler. Ascent has two
queues batch and test. To declare LSF executors we define them under lsf
section within the executors
section.
The default launcher is bsub which can be defined under defaults
. The
pollinterval
will poll LSF jobs every 10 seconds using bjobs
. The
pollinterval
accepts a range between 10 - 300 seconds as defined in
schema. In order to avoid polling scheduler excessively pick a number that is best
suitable for your site
Ascent buildtest configuration
system:
ascent:
hostnames:
- login1.ascent.olcf.ornl.gov
moduletool: lmod
pager: False
executors:
defaults:
pollinterval: 30
maxpendtime: 300
account: gen014ecpci
local:
bash:
description: submit jobs on local machine using bash shell
shell: bash
sh:
description: submit jobs on local machine using sh shell
shell: sh
csh:
description: submit jobs on local machine using csh shell
shell: csh
python:
description: submit jobs on local machine using python shell
shell: python
lsf:
batch:
queue: batch
compilers:
find:
gcc: '^(gcc)'
compiler:
gcc:
builtin_gcc:
cc: /usr/bin/gcc
cxx: /usr/bin/g++
fc: /usr/bin/gfortran
gcc/9.3.0:
cc: gcc
cxx: g++
fc: gfortran
module:
load:
- gcc/9.3.0
purge: false
gcc/11.1.0:
cc: gcc
cxx: g++
fc: gfortran
module:
load:
- gcc/11.1.0
purge: false
gcc/7.5.0:
cc: gcc
cxx: g++
fc: gfortran
module:
load:
- gcc/7.5.0
purge: false
gcc/12.1.0:
cc: gcc
cxx: g++
fc: gfortran
module:
load:
- gcc/12.1.0
purge: false
gcc/11.2.0:
cc: gcc
cxx: g++
fc: gfortran
module:
load:
- gcc/11.2.0
purge: false
gcc/10.2.0:
cc: gcc
cxx: g++
fc: gfortran
module:
load:
- gcc/10.2.0
purge: false
gcc/9.1.0:
cc: gcc
cxx: g++
fc: gfortran
module:
load:
- gcc/9.1.0
purge: false
JLSE @ ANL¶
Joint Laboratory for System Evaluation (JLSE) provides
a testbed of emerging HPC systems, the default scheduler is Cobalt, this is
defined in the cobalt
section defined in the executor field.
We set default launcher to qsub defined with launcher: qsub
. This is inherited
for all batch executors. In each cobalt executor the queue
property will specify
the queue name to submit job, for instance the executor yarrow
with queue: yarrow
will submit job using qsub -q yarrow
when using this executor.
JLSE buildtest configuration
system:
jlse:
hostnames:
- jlselogin*
moduletool: environment-modules
pager: False
executors:
defaults:
pollinterval: 30
maxpendtime: 300
local:
bash:
description: submit jobs on local machine using bash shell
shell: bash
sh:
description: submit jobs on local machine using sh shell
shell: sh
csh:
description: submit jobs on local machine using csh shell
shell: csh
python:
description: submit jobs on local machine using python shell
shell: python
cobalt:
testing:
queue: testing
compilers:
find:
gcc: "^(gcc)"
compiler:
gcc:
builtin_gcc:
cc: /usr/bin/gcc
cxx: /usr/bin/g++
fc: /usr/bin/gfortran