Site Examples¶
NERSC¶
NERSC provides High Performance Computing system to support research in the Office of Science program offices. Currently, NERSC provides two HPC systems including Perlmutter and Cori. In example below we see the configuration for Cori and Perlmutter. Note that we can define a single configuration for both systems. Perlmutter is using Lmod while Cori is running environment-modules. We define Local executors and Slurm executors for each system which are mapped to qos provided by our Slurm cluster.
In-order to use bigmem
, xfer
,
or gpu
qos at Cori, we need to specify escori cluster (i.e sbatch --clusters=escori
).
system:
gerty:
description: "Test System for Cori"
hostnames:
- gert01.nersc.gov
moduletool: environment-modules
executors:
local:
bash:
description: submit jobs on local machine using bash shell
shell: bash
sh:
description: submit jobs on local machine using sh shell
shell: sh
csh:
description: submit jobs on local machine using csh shell
shell: csh
python:
description: submit jobs on local machine using python shell
shell: python
compilers:
compiler:
gcc:
builtin_gcc:
cc: /usr/bin/gcc
cxx: /usr/bin/g++
fc: /usr/bin/gfortran
cdash:
url: https://my.cdash.org
project: buildtest-nersc
site: gerty
perlmutter:
description: "Cray Shasta system with AMD CPU and NVIDIA A100 GPUs"
hostnames:
- login*
moduletool: lmod
executors:
defaults:
pollinterval: 30
maxpendtime: 86400
local:
bash:
description: submit jobs on local machine using bash shell
shell: bash
sh:
description: submit jobs on local machine using sh shell
shell: sh
csh:
description: submit jobs on local machine using csh shell
shell: csh
zsh:
description: submit jobs on local machine using zsh shell
shell: zsh
python:
description: submit jobs on local machine using python shell
shell: python
slurm:
regular:
qos: regular
debug:
qos: debug
compilers:
find:
gcc: ^(gcc)
compiler:
gcc:
builtin_gcc:
cc: /usr/bin/gcc
cxx: /usr/bin/g++
fc: /usr/bin/gfortran
cdash:
url: https://my.cdash.org
project: buildtest-nersc
site: perlmutter
cori:
hostnames:
- cori*
description: "Cray XC system based on Intel Haswell and KNL nodes"
moduletool: environment-modules
cdash:
url: https://my.cdash.org
project: buildtest-nersc
site: cori
executors:
defaults:
pollinterval: 30
maxpendtime: 86400
local:
bash:
description: submit jobs on local machine using bash shell
shell: bash
sh:
description: submit jobs on local machine using sh shell
shell: sh
csh:
description: submit jobs on local machine using csh shell
shell: csh
python:
description: submit jobs on local machine using python shell
shell: python
slurm:
haswell_debug:
qos: debug
cluster: cori
options:
- -C haswell
description: debug queue on Haswell partition
haswell_shared:
qos: shared
cluster: cori
options:
- -C haswell
description: shared queue on Haswell partition
haswell_regular:
qos: regular
cluster: cori
options:
- -C haswell
description: normal queue on Haswell partition
haswell_premium:
qos: premium
cluster: cori
options:
- -C haswell
description: premium queue on Haswell partition
haswell_flex:
qos: flex
cluster: cori
options:
- -C haswell
description: flex queue on Haswell partition
knl_flex:
description: overrun queue on KNL partition
qos: overrun
cluster: cori
options:
- -C knl
bigmem:
description: bigmem jobs
cluster: escori
qos: bigmem
xfer:
description: xfer qos jobs
qos: xfer
cluster: escori
options:
- -C haswell
compile:
description: compile qos jobs
qos: compile
cluster: escori
options:
- -N 1
knl_debug:
qos: debug
cluster: cori
options:
- -C knl,quad,cache
description: debug queue on KNL partition
knl_regular:
qos: normal
cluster: cori
options:
- -C knl,quad,cache
description: normal queue on KNL partition
knl_premium:
qos: premium
cluster: cori
options:
- -C knl,quad,cache
description: premium queue on KNL partition
knl_low:
qos: low
cluster: cori
options:
- -C knl,quad,cache
description: low queue on KNL partition
knl_overrun:
description: overrun queue on KNL partition
qos: overrun
cluster: cori
options:
- -C knl
- --time-min=01:00:00
gpu:
description: submit jobs to GPU partition
options:
- -C gpu
cluster: escori
compilers:
find:
gcc: ^(gcc|PrgEnv-gnu)
cray: ^(PrgEnv-cray)
intel: ^(intel|PrgEnv-intel)
cuda: ^(cuda/)
upcxx: ^(upcxx)
compiler:
gcc:
builtin_gcc:
cc: /usr/bin/gcc
cxx: /usr/bin/g++
fc: /usr/bin/gfortran
PrgEnv-gnu/6.0.5:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-gnu/6.0.5
purge: false
PrgEnv-gnu/6.0.7:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-gnu/6.0.7
purge: false
PrgEnv-gnu/6.0.9:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-gnu/6.0.9
purge: false
gcc/6.1.0:
cc: gcc
cxx: g++
fc: gfortran
module:
load:
- gcc/6.1.0
purge: false
gcc/7.3.0:
cc: gcc
cxx: g++
fc: gfortran
module:
load:
- gcc/7.3.0
purge: false
gcc/8.1.0:
cc: gcc
cxx: g++
fc: gfortran
module:
load:
- gcc/8.1.0
purge: false
gcc/8.2.0:
cc: gcc
cxx: g++
fc: gfortran
module:
load:
- gcc/8.2.0
purge: false
gcc/8.3.0:
cc: gcc
cxx: g++
fc: gfortran
module:
load:
- gcc/8.3.0
purge: false
gcc/9.3.0:
cc: gcc
cxx: g++
fc: gfortran
module:
load:
- gcc/9.3.0
purge: false
gcc/10.1.0:
cc: gcc
cxx: g++
fc: gfortran
module:
load:
- gcc/10.1.0
purge: false
gcc/6.3.0:
cc: gcc
cxx: g++
fc: gfortran
module:
load:
- gcc/6.3.0
purge: false
gcc/8.1.1-openacc-gcc-8-branch-20190215:
cc: gcc
cxx: g++
fc: gfortran
module:
load:
- gcc/8.1.1-openacc-gcc-8-branch-20190215
purge: false
cray:
PrgEnv-cray/6.0.5:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-cray/6.0.5
purge: false
PrgEnv-cray/6.0.7:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-cray/6.0.7
purge: false
PrgEnv-cray/6.0.9:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-cray/6.0.9
purge: false
intel:
PrgEnv-intel/6.0.5:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-intel/6.0.5
purge: false
PrgEnv-intel/6.0.7:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-intel/6.0.7
purge: false
PrgEnv-intel/6.0.9:
cc: cc
cxx: CC
fc: ftn
module:
load:
- PrgEnv-intel/6.0.9
purge: false
intel/19.0.3.199:
cc: icc
cxx: icpc
fc: ifort
module:
load:
- intel/19.0.3.199
purge: false
intel/19.1.2.254:
cc: icc
cxx: icpc
fc: ifort
module:
load:
- intel/19.1.2.254
purge: false
intel/16.0.3.210:
cc: icc
cxx: icpc
fc: ifort
module:
load:
- intel/16.0.3.210
purge: false
intel/17.0.1.132:
cc: icc
cxx: icpc
fc: ifort
module:
load:
- intel/17.0.1.132
purge: false
intel/17.0.2.174:
cc: icc
cxx: icpc
fc: ifort
module:
load:
- intel/17.0.2.174
purge: false
intel/18.0.1.163:
cc: icc
cxx: icpc
fc: ifort
module:
load:
- intel/18.0.1.163
purge: false
intel/18.0.3.222:
cc: icc
cxx: icpc
fc: ifort
module:
load:
- intel/18.0.3.222
purge: false
intel/19.0.0.117:
cc: icc
cxx: icpc
fc: ifort
module:
load:
- intel/19.0.0.117
purge: false
intel/19.0.8.324:
cc: icc
cxx: icpc
fc: ifort
module:
load:
- intel/19.0.8.324
purge: false
intel/19.1.0.166:
cc: icc
cxx: icpc
fc: ifort
module:
load:
- intel/19.1.0.166
purge: false
intel/19.1.1.217:
cc: icc
cxx: icpc
fc: ifort
module:
load:
- intel/19.1.1.217
purge: false
intel/19.1.2.275:
cc: icc
cxx: icpc
fc: ifort
module:
load:
- intel/19.1.2.275
purge: false
intel/19.1.3.304:
cc: icc
cxx: icpc
fc: ifort
module:
load:
- intel/19.1.3.304
purge: false
cuda:
cuda/9.2.148:
cc: nvcc
cxx: nvcc
fc: None
module:
load:
- cuda/9.2.148
purge: false
cuda/10.0.130:
cc: nvcc
cxx: nvcc
fc: None
module:
load:
- cuda/10.0.130
purge: false
cuda/10.1.105:
cc: nvcc
cxx: nvcc
fc: None
module:
load:
- cuda/10.1.105
purge: false
cuda/10.1.168:
cc: nvcc
cxx: nvcc
fc: None
module:
load:
- cuda/10.1.168
purge: false
cuda/10.1.243:
cc: nvcc
cxx: nvcc
fc: None
module:
load:
- cuda/10.1.243
purge: false
cuda/10.2.89:
cc: nvcc
cxx: nvcc
fc: None
module:
load:
- cuda/10.2.89
purge: false
cuda/11.0.2:
cc: nvcc
cxx: nvcc
fc: None
module:
load:
- cuda/11.0.2
purge: false
cuda/shifter:
cc: nvcc
cxx: nvcc
fc: None
module:
load:
- cuda/shifter
purge: false
upcxx:
upcxx/2019.9.0:
cc: upcxx
cxx: upcxx
fc: None
module:
load:
- upcxx/2019.9.0
purge: false
upcxx/2020.3.0:
cc: upcxx
cxx: upcxx
fc: None
module:
load:
- upcxx/2020.3.0
purge: false
upcxx/2020.3.2:
cc: upcxx
cxx: upcxx
fc: None
module:
load:
- upcxx/2020.3.2
purge: false
upcxx/2020.3.8-snapshot:
cc: upcxx
cxx: upcxx
fc: None
module:
load:
- upcxx/2020.3.8-snapshot
purge: false
upcxx/2020.10.0:
cc: upcxx
cxx: upcxx
fc: None
module:
load:
- upcxx/2020.10.0
purge: false
upcxx/2020.11.0:
cc: upcxx
cxx: upcxx
fc: None
module:
load:
- upcxx/2020.11.0
purge: false
upcxx/bleeding-edge:
cc: upcxx
cxx: upcxx
fc: None
module:
load:
- upcxx/bleeding-edge
purge: false
upcxx/nightly:
cc: upcxx
cxx: upcxx
fc: None
module:
load:
- upcxx/nightly
purge: false
upcxx-bupc-narrow/2019.9.0:
cc: upcxx
cxx: upcxx
fc: None
module:
load:
- upcxx-bupc-narrow/2019.9.0
purge: false
upcxx-bupc-narrow/2020.3.0:
cc: upcxx
cxx: upcxx
fc: None
module:
load:
- upcxx-bupc-narrow/2020.3.0
purge: false
upcxx-bupc-narrow/2020.3.2:
cc: upcxx
cxx: upcxx
fc: None
module:
load:
- upcxx-bupc-narrow/2020.3.2
purge: false
upcxx-bupc-narrow/2020.3.8-snapshot:
cc: upcxx
cxx: upcxx
fc: None
module:
load:
- upcxx-bupc-narrow/2020.3.8-snapshot
purge: false
upcxx-bupc-narrow/2020.11.0:
cc: upcxx
cxx: upcxx
fc: None
module:
load:
- upcxx-bupc-narrow/2020.11.0
purge: false
upcxx-bupc-narrow/bleeding-edge:
cc: upcxx
cxx: upcxx
fc: None
module:
load:
- upcxx-bupc-narrow/bleeding-edge
purge: false
upcxx-cs267/2020.10.0:
cc: upcxx
cxx: upcxx
fc: None
module:
load:
- upcxx-cs267/2020.10.0
purge: false
upcxx-extras/2020.3.0:
cc: upcxx
cxx: upcxx
fc: None
module:
load:
- upcxx-extras/2020.3.0
purge: false
upcxx-extras/2020.3.8:
cc: upcxx
cxx: upcxx
fc: None
module:
load:
- upcxx-extras/2020.3.8
purge: false
upcxx-extras/master:
cc: upcxx
cxx: upcxx
fc: None
module:
load:
- upcxx-extras/master
purge: false
Ascent @ OLCF¶
Ascent is a training
system for Summit at OLCF, which is using a IBM Load Sharing
Facility (LSF) as their batch scheduler. Ascent has two
queues batch and test. To declare LSF executors we define them under lsf
section within the executors
section.
The default launcher is bsub which can be defined under defaults
. The
pollinterval
will poll LSF jobs every 10 seconds using bjobs
. The
pollinterval
accepts a range between 10 - 300 seconds as defined in
schema. In order to avoid polling scheduler excessively pick a number that is best
suitable for your site
system:
ascent:
hostnames: [login1.ascent.olcf.ornl.gov]
moduletool: lmod
executors:
defaults:
pollinterval: 30
maxpendtime: 300
account: gen014ecpci
local:
bash:
description: submit jobs on local machine using bash shell
shell: bash
sh:
description: submit jobs on local machine using sh shell
shell: sh
csh:
description: submit jobs on local machine using csh shell
shell: csh
python:
description: submit jobs on local machine using python shell
shell: python
lsf:
batch:
queue: batch
compilers:
find:
gcc: "^(gcc)"
compiler:
gcc:
builtin_gcc:
cc: /usr/bin/gcc
cxx: /usr/bin/g++
fc: /usr/bin/gfortran
JLSE @ ANL¶
Joint Laboratory for System Evaluation (JLSE) provides
a testbed of emerging HPC systems, the default scheduler is Cobalt, this is
defined in the cobalt
section defined in the executor field.
We set default launcher to qsub defined with launcher: qsub
. This is inherited
for all batch executors. In each cobalt executor the queue
property will specify
the queue name to submit job, for instance the executor yarrow
with queue: yarrow
will submit job using qsub -q yarrow
when using this executor.
system:
jlse:
hostnames:
- jlselogin*
moduletool: environment-modules
executors:
defaults:
pollinterval: 30
maxpendtime: 300
local:
bash:
description: submit jobs on local machine using bash shell
shell: bash
sh:
description: submit jobs on local machine using sh shell
shell: sh
csh:
description: submit jobs on local machine using csh shell
shell: csh
python:
description: submit jobs on local machine using python shell
shell: python
cobalt:
testing:
queue: testing
compilers:
find:
gcc: "^(gcc)"
compiler:
gcc:
builtin_gcc:
cc: /usr/bin/gcc
cxx: /usr/bin/g++
fc: /usr/bin/gfortran