Site Examples

NERSC

NERSC provides High Performance Computing system to support research in the Office of Science program offices. Currently, NERSC provides two HPC systems including Perlmutter and Cori. In example below we see the configuration for Cori and Perlmutter. Note that we can define a single configuration for both systems. Perlmutter is using Lmod while Cori is running environment-modules. We define Local executors and Slurm executors for each system which are mapped to qos provided by our Slurm cluster.

In-order to use bigmem, xfer, or gpu qos at Cori, we need to specify escori cluster (i.e sbatch --clusters=escori).

system:
  gerty:
    description: Test System for Cori
    hostnames:
    - gert01.nersc.gov
    moduletool: environment-modules
    executors:
      local:
        bash:
          description: submit jobs on local machine using bash shell
          shell: bash
        sh:
          description: submit jobs on local machine using sh shell
          shell: sh
        csh:
          description: submit jobs on local machine using csh shell
          shell: csh
    compilers:
      compiler:
        gcc:
          builtin_gcc:
            cc: /usr/bin/gcc
            cxx: /usr/bin/g++
            fc: /usr/bin/gfortran
    cdash:
      url: https://my.cdash.org
      project: buildtest-nersc
      site: gerty
  perlmutter:
    description: Cray Shasta system with AMD CPU and NVIDIA A100 GPUs
    hostnames:
    - login*
    moduletool: lmod
    executors:
      defaults:
        pollinterval: 30
        maxpendtime: 86400
      local:
        bash:
          description: submit jobs on local machine using bash shell
          shell: bash
        sh:
          description: submit jobs on local machine using sh shell
          shell: sh
        csh:
          description: submit jobs on local machine using csh shell
          shell: csh
        zsh:
          description: submit jobs on local machine using zsh shell
          shell: zsh
        python:
          description: submit jobs on local machine using python shell
          shell: python
      slurm:
        regular:
          qos: regular
        debug:
          qos: debug
        xfer:
          qos: xfer
        preempt:
          qos: preempt
    compilers:
      find:
        gcc: ^(gcc)
        cray: ^(cce)
        nvhpc: ^(nvhpc)
      compiler:
        gcc:
          builtin_gcc:
            cc: /usr/bin/gcc
            cxx: /usr/bin/g++
            fc: /usr/bin/gfortran
          gcc/11.2.0:
            cc: gcc
            cxx: g++
            fc: gfortran
            module:
              load:
              - gcc/11.2.0
              purge: false
          gcc/10.3.0:
            cc: gcc
            cxx: g++
            fc: gfortran
            module:
              load:
              - gcc/10.3.0
              purge: false
        cray:
          cce/13.0.2:
            cc: cc
            cxx: CC
            fc: ftn
            module:
              load:
              - PrgEnv-cray
              - cce/13.0.2
              purge: false
          cce/13.0.1:
            cc: cc
            cxx: CC
            fc: ftn
            module:
              load:
              - PrgEnv-cray
              - cce/13.0.1
              purge: false
        nvhpc:
          nvhpc/22.5:
            cc: nvc
            cxx: nvcc
            fc: nvfortran
            module:
              load:
              - nvhpc/22.5
              purge: false
          nvhpc/21.11:
            cc: nvc
            cxx: nvcc
            fc: nvfortran
            module:
              load:
              - nvhpc/21.11
              purge: false
          nvhpc/21.3:
            cc: nvc
            cxx: nvcc
            fc: nvfortran
            module:
              load:
              - nvhpc/21.3
              purge: false
    cdash:
      url: https://my.cdash.org
      project: buildtest-nersc
      site: perlmutter
  cori:
    hostnames:
    - cori*
    description: Cray XC system based on Intel Haswell and KNL nodes
    moduletool: environment-modules
    cdash:
      url: https://my.cdash.org
      project: buildtest-nersc
      site: cori
    executors:
      defaults:
        pollinterval: 30
        maxpendtime: 86400
      local:
        bash:
          description: submit jobs on local machine using bash shell
          shell: bash
        sh:
          description: submit jobs on local machine using sh shell
          shell: sh
        csh:
          description: submit jobs on local machine using csh shell
          shell: csh
        python:
          description: submit jobs on local machine using python shell
          shell: python
      slurm:
        haswell_debug:
          qos: debug
          cluster: cori
          options:
          - -C haswell
          description: debug queue on Haswell partition
        haswell_shared:
          qos: shared
          cluster: cori
          options:
          - -C haswell
          description: shared queue on Haswell partition
        haswell_regular:
          qos: regular
          cluster: cori
          options:
          - -C haswell
          description: normal queue on Haswell partition
        haswell_premium:
          qos: premium
          cluster: cori
          options:
          - -C haswell
          description: premium queue on Haswell partition
        haswell_flex:
          qos: flex
          cluster: cori
          options:
          - -C haswell
          description: flex queue on Haswell partition
        knl_flex:
          description: overrun queue on KNL partition
          qos: overrun
          cluster: cori
          options:
          - -C knl
        bigmem:
          description: bigmem jobs
          cluster: escori
          qos: bigmem
        xfer:
          description: xfer qos jobs
          qos: xfer
          cluster: escori
          options:
          - -C haswell
        compile:
          description: compile qos jobs
          qos: compile
          cluster: escori
          options:
          - -N 1
        knl_debug:
          qos: debug
          cluster: cori
          options:
          - -C knl,quad,cache
          description: debug queue on KNL partition
        knl_regular:
          qos: normal
          cluster: cori
          options:
          - -C knl,quad,cache
          description: normal queue on KNL partition
        knl_premium:
          qos: premium
          cluster: cori
          options:
          - -C knl,quad,cache
          description: premium queue on KNL partition
        knl_low:
          qos: low
          cluster: cori
          options:
          - -C knl,quad,cache
          description: low queue on KNL partition
        knl_overrun:
          description: overrun queue on KNL partition
          qos: overrun
          cluster: cori
          options:
          - -C knl
          - --time-min=01:00:00
        gpu:
          description: submit jobs to GPU partition
          options:
          - -C gpu
          cluster: escori
    compilers:
      find:
        gcc: ^(gcc|PrgEnv-gnu)
        cray: ^(PrgEnv-cray)
        intel: ^(intel|PrgEnv-intel)
        cuda: ^(cuda/)
        upcxx: ^(upcxx)
      compiler:
        gcc:
          builtin_gcc:
            cc: /usr/bin/gcc
            fc: /usr/bin/gfortran
            cxx: /usr/bin/g++
          PrgEnv-gnu/6.0.5:
            cc: gcc
            cxx: g++
            fc: gfortran
            module:
              load:
              - PrgEnv-gnu/6.0.5
              purge: false
          PrgEnv-gnu/6.0.10:
            cc: gcc
            cxx: g++
            fc: gfortran
            module:
              load:
              - PrgEnv-gnu/6.0.10
              purge: false
          gcc/7.3.0:
            cc: gcc
            cxx: g++
            fc: gfortran
            module:
              load:
              - gcc/7.3.0
              purge: false
          gcc/8.1.0:
            cc: gcc
            cxx: g++
            fc: gfortran
            module:
              load:
              - gcc/8.1.0
              purge: false
          gcc/8.3.0:
            cc: gcc
            cxx: g++
            fc: gfortran
            module:
              load:
              - gcc/8.3.0
              purge: false
          gcc/10.3.0:
            cc: gcc
            cxx: g++
            fc: gfortran
            module:
              load:
              - gcc/10.3.0
              purge: false
          gcc/11.2.0:
            cc: gcc
            cxx: g++
            fc: gfortran
            module:
              load:
              - gcc/11.2.0
              purge: false
        cray:
          PrgEnv-cray/6.0.5:
            cc: cc
            cxx: CC
            fc: ftn
            module:
              load:
              - PrgEnv-cray/6.0.5
              purge: false
          PrgEnv-cray/6.0.10:
            cc: cc
            cxx: CC
            fc: ftn
            module:
              load:
              - PrgEnv-cray/6.0.10
              purge: false
        intel:
          PrgEnv-intel/6.0.5:
            cc: icc
            cxx: icpc
            fc: ifort
            module:
              load:
              - PrgEnv-intel/6.0.5
              purge: false
          PrgEnv-intel/6.0.10:
            cc: icc
            cxx: icpc
            fc: ifort
            module:
              load:
              - PrgEnv-intel/6.0.10
              purge: false
          intel/19.0.3.199:
            cc: icc
            cxx: icpc
            fc: ifort
            module:
              load:
              - intel/19.0.3.199
              purge: false
          intel/19.1.2.254:
            cc: icc
            cxx: icpc
            fc: ifort
            module:
              load:
              - intel/19.1.2.254
              purge: false
          intel/19.1.0.166:
            cc: icc
            cxx: icpc
            fc: ifort
            module:
              load:
              - intel/19.1.0.166
              purge: false
          intel/19.1.1.217:
            cc: icc
            cxx: icpc
            fc: ifort
            module:
              load:
              - intel/19.1.1.217
              purge: false
          intel/19.1.2.275:
            cc: icc
            cxx: icpc
            fc: ifort
            module:
              load:
              - intel/19.1.2.275
              purge: false
          intel/19.1.3.304:
            cc: icc
            cxx: icpc
            fc: ifort
            module:
              load:
              - intel/19.1.3.304
              purge: false
        upcxx:
          upcxx/2021.9.0:
            cc: upcxx
            cxx: upcxx
            fc: None
            module:
              load:
              - upcxx/2021.9.0
              purge: false
          upcxx/2022.3.0:
            cc: upcxx
            cxx: upcxx
            fc: None
            module:
              load:
              - upcxx/2022.3.0
              purge: false
          upcxx/bleeding-edge:
            cc: upcxx
            cxx: upcxx
            fc: None
            module:
              load:
              - upcxx/bleeding-edge
              purge: false
          upcxx/nightly:
            cc: upcxx
            cxx: upcxx
            fc: None
            module:
              load:
              - upcxx/nightly
              purge: false
          upcxx-bupc-narrow/2021.9.0:
            cc: upcxx
            cxx: upcxx
            fc: None
            module:
              load:
              - upcxx-bupc-narrow/2021.9.0
              purge: false
          upcxx-bupc-narrow/2022.3.0:
            cc: upcxx
            cxx: upcxx
            fc: None
            module:
              load:
              - upcxx-bupc-narrow/2022.3.0
              purge: false
          upcxx-bupc-narrow/bleeding-edge:
            cc: upcxx
            cxx: upcxx
            fc: None
            module:
              load:
              - upcxx-bupc-narrow/bleeding-edge
              purge: false
          upcxx-extras/2020.3.0:
            cc: upcxx
            cxx: upcxx
            fc: None
            module:
              load:
              - upcxx-extras/2020.3.0
              purge: false
          upcxx-extras/2020.3.8:
            cc: upcxx
            cxx: upcxx
            fc: None
            module:
              load:
              - upcxx-extras/2020.3.8
              purge: false
          upcxx-extras/2022.3.0:
            cc: upcxx
            cxx: upcxx
            fc: None
            module:
              load:
              - upcxx-extras/2022.3.0
              purge: false
          upcxx-extras/master:
            cc: upcxx
            cxx: upcxx
            fc: None
            module:
              load:
              - upcxx-extras/master
              purge: false

Ascent @ OLCF

Ascent is a training system for Summit at OLCF, which is using a IBM Load Sharing Facility (LSF) as their batch scheduler. Ascent has two queues batch and test. To declare LSF executors we define them under lsf section within the executors section.

The default launcher is bsub which can be defined under defaults. The pollinterval will poll LSF jobs every 10 seconds using bjobs. The pollinterval accepts a range between 10 - 300 seconds as defined in schema. In order to avoid polling scheduler excessively pick a number that is best suitable for your site

system:
  ascent:
    hostnames: [login1.ascent.olcf.ornl.gov]
    moduletool: lmod
    executors:
      defaults:
        pollinterval: 30
        maxpendtime: 300
        account: gen014ecpci
      local:
        bash:
          description: submit jobs on local machine using bash shell
          shell: bash
        sh:
          description: submit jobs on local machine using sh shell
          shell: sh
        csh:
          description: submit jobs on local machine using csh shell
          shell: csh
        python:
          description: submit jobs on local machine using python shell
          shell: python
      lsf:
        batch:
          queue: batch

    compilers:
      find:
        gcc: "^(gcc)"
      compiler:
        gcc:
          builtin_gcc:
            cc: /usr/bin/gcc
            cxx: /usr/bin/g++
            fc: /usr/bin/gfortran

JLSE @ ANL

Joint Laboratory for System Evaluation (JLSE) provides a testbed of emerging HPC systems, the default scheduler is Cobalt, this is defined in the cobalt section defined in the executor field.

We set default launcher to qsub defined with launcher: qsub. This is inherited for all batch executors. In each cobalt executor the queue property will specify the queue name to submit job, for instance the executor yarrow with queue: yarrow will submit job using qsub -q yarrow when using this executor.

system:
  jlse:
    hostnames:
    - jlselogin*
    moduletool: environment-modules
    executors:
      defaults:
        pollinterval: 30
        maxpendtime: 300
      local:
        bash:
          description: submit jobs on local machine using bash shell
          shell: bash
        sh:
          description: submit jobs on local machine using sh shell
          shell: sh
        csh:
          description: submit jobs on local machine using csh shell
          shell: csh
        python:
          description: submit jobs on local machine using python shell
          shell: python
      cobalt:
        testing:
          queue: testing
    compilers:
      find:
        gcc: "^(gcc)"
      compiler:
        gcc:
          builtin_gcc:
            cc: /usr/bin/gcc
            cxx: /usr/bin/g++
            fc: /usr/bin/gfortran