Buildtest Tutorial on Perlmutter

This tutorial will be conducted on the Perlmutter system. If you need account access please obtain a user account.

Setup

Once you have a NERSC account, you can connect to any NERSC system. terminal client and ssh into perlmutter as follows:

ssh <user>@perlmutter-p1.nersc.gov

To get started please load the python module since you will need python 3.8 or higher to use buildtest. This can be done by running:

module load python

Next, you should Install buildtest by cloning the repository into your HOME directory:

git clone https://github.com/buildtesters/buildtest.git $HOME/buildtest

Note

Please make sure you create a python virtual environment before you proceed with this tutorial.

Once you have buildtest setup, please clone the following repository into your home directory:

git clone https://github.com/buildtesters/buildtest-nersc $HOME/buildtest-nersc

You will need to set the environment variable BUILDTEST_CONFIGFILE which will point to the configuration file required to use buildtest on Perlmutter.

export BUILDTEST_CONFIGFILE=$HOME/buildtest-nersc/config.yml

Once you are done, please navigate back to the root of buildtest by running:

cd $BUILDTEST_ROOT

The exercise can be found in directory buildtest/perlmutter_tutorial where you will have several exercises to complete. You can navigate to this directory by running:

cd $BUILDTEST_ROOT/perlmutter_tutorial

If you get stuck on any exercise, you can see the solution to each exercise in file “.solution.txt”

Note

For exercise 2 and 3, you can check the solution by running the shell script bash .solution.sh

Exercise 1: Performing Status Check

In this exercise, you will check the version of Lmod using the environment variable LMOD_VERSION and specify the the output using a regular expression. We will run the test with an invalid regular expression and see if test FAIL and rerun test until it PASS. Shown below is the example buildspec and please fix the highlighting lines in the test

buildspecs:
  test_lmod_version:
    type: FIXME
    executors: 'perlmutter.local.bash'
    run: echo $LMOD_VERSION

Todo

Run the test by running buildtest build -b $BUILDTEST_ROOT/perlmutter_tutorial/ex1/module_version.yml and you will notice failure in validation
Validate the buildspec using buildtest buildspec validate to determine the error
Fix the buildspec and rerun buildtest buildspec validate until we have a valid buildspec.
Add a regular expression on stdout stream and make sure test fails
Check output of test via buildtest inspect query
Update regular expression to match output with value of $LMOD_VERSION reported in test and rerun test until it passes.

Exercise 2: Querying Buildspec Cache

In this exercise you will learn how to use the Buildspecs Interface. Let’s build the cache by running the following:

buildtest buildspec find --directory $HOME/buildtest-nersc/buildspecs --rebuild -q

Todo

Find all tags
List all filters and format fields
Format tables via fields name, description
Filter buildspecs by tag e4s
List all invalid buildspecs
Validate all buildspecs by tag e4s
Show content of test hello_world_openmp

Exercise 3: Query Test Report

In this exercise you will learn how to query test report. This can be done by running buildtest report.

Before you start, please run the following command:

buildtest bd -b $HOME/buildtest-nersc/buildspecs/apps/spack/

Todo

List all filters and format fields
Query all tests by returncode 0
Query all tests by tag e4s
Print the total count of all failed tests

Let’s upload the tests to CDASH by running the following:

buildtest cdash upload $USER-buildtest-tutorial

Buildtest cdash integration via buildtest cdash upload allows buildtest to push test results to CDASH server. The test results are captured in report file typically shown via buildtest report. CDASH allows one to easily process the test results in web-interface.

If you were successful in running above command, you should see a link to CDASH server https://my.cdash.org with link to test results, please click on the link to view your test results and briefly analyze the test results. Shown below is an example output

   buildtest cdash upload $USER-buildtest-tutorial
Reading report file:  /Users/siddiq90/Documents/github/buildtest/var/report.json
Uploading 110 tests
Build Name:  siddiq90-buildtest-tutorial
site:  generic
MD5SUM: a589c72bcdabdab9038600a2789e429f
You can view the results at: https://my.cdash.org//viewTest.php?buildid=2278337

Exercise 4: Specifying Performance Checks

In this exercise, you will be running the STREAM benchmark and use comparison operators to determine if test will pass based on the performance results. Shown below is the stream test that we will be using for this exercise

buildspecs:
  stream_test:
    type: script
    executor: perlmutter.local.bash
    description: Run stream test
    env:
      OMP_NUM_THREADS: 4
    run: |
      wget https://raw.githubusercontent.com/jeffhammond/STREAM/master/stream.c
      gcc -openmp -o stream stream.c
      ./stream
    metrics:
      copy:
        type: float
        regex:
          exp: 'Copy:\s+(\S+)\s+.*'
          stream: stdout
          item: 1
      scale:
        type: float
        regex:
          exp: 'Scale:\s+(\S+)\s+.*'
          stream: stdout
          item: 1

Todo

Run the stream test by running buildtest build -b $BUILDTEST_ROOT/perlmutter_tutorial/ex4/stream.yml
Check the output of metrics copy and scale by running buildtest inspect query -o stream_test
Use the assert_ge: Greater Equal check with metric copy and scale. Specify a reference value 50000 for metric copy and scale*
Run the same test and examine output
Next try different reference value such as 5000 and rerun test and see output

Exercise 5: Running a Batch Job

In this exercise, you will submit a batch job that will run hostname in the slurm cluster. Shown below is the example buildspec

buildspecs:
  hostname_perlmutter:
    description: run hostname on perlmutter
    type: script
    executor: 'perlmutter.slurm.debug'
    tags: ["queues","jobs"]
    sbatch: ["-t 5", "-n 1", "-N 1", "-C cpu"]
    run: hostname

Take note that the test will run on executor perlmutter.slurm.debug which corresponds to the slurm debug queue on Perlmutter. The sbatch options specify the batch directives for running the job.

In this exercise you are requested to do the following:

Todo

Run the test with poll interval for 10 sec $BUILDTEST_ROOT/perlmutter_tutorial/ex5/hostname.yml and take note of output, you should see job is submitted to batch scheduler. Refer to buildtest build --help for list of complete options
Check the output of test via buildtest inspect query
Update the test to make use of Multiple Executors and run test on both regular and debug queue and rerun the test.
Rerun same test and you should see two test runs for hostname_perlmutter one for each executor.

If you have completed this exercise, you should expect the following output from buildtest build.

                                                                Test Summary
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ builder                               ┃ executor                    ┃ status ┃ checks (ReturnCode, Regex, Runtime) ┃ returncode ┃ runtime  ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━┩
│ hostname_perlmutter/80e317c1          │ perlmutter.slurm.regular    │ PASS   │ N/A N/A N/A                         │ 0          │ 45.324512│
├───────────────────────────────────────┼─────────────────────────────┼────────┼─────────────────────────────────────┼────────────┼──────────┤
│ hostname_perlmutter/b1d7b318          │ perlmutter.slurm.debug      │ PASS   │ N/A N/A N/A                         │ 0          │ 75.54278 │
└───────────────────────────────────────┴─────────────────────────────┴────────┴─────────────────────────────────────┴────────────┴──────────┘