buildtest.scheduler.pbs
Module Contents
Classes
The PBSJob models a PBS Job with helper methods to retrieve job state, check if job is running/pending/suspended. We have methods |
|
The PBSJob models a PBS Job with helper methods to retrieve job state, check if job is running/pending/suspended. We have methods |
Attributes
- buildtest.scheduler.pbs.logger
- class buildtest.scheduler.pbs.PBSJob(jobID, sched_cmds)[source]
Bases:
buildtest.scheduler.job.Job
The PBSJob models a PBS Job with helper methods to retrieve job state, check if job is running/pending/suspended. We have methods to poll job state, gather job results upon completion and cancel job.
- is_suspended()[source]
Return
True
if job is suspended which would be in one of these statesH
,U
,S
.
- success()[source]
This method determines if job was completed successfully and returns
True
if exit code is 0.According to https://help.altair.com/2021.1.3/PBS%20Professional/PBSAdminGuide2021.1.3.pdf section 14.9 Job Exit Status Codes we have the following
Exit Code: X < 0 - Job could not be executed
Exit Code: 0 <= X < 128 - Exit value of Shell or top-level process
Exit Code: X >= 128 - Job was killed by signal
Exit Code: X == 0 - Job executed was a successful
- poll()[source]
This method will poll the PBS Job by running
qstat -f <jobid>
which will retrieve the job details and extract data such as job state, exit code, output and error file. A typical output for the PBS job looks something like this(buildtest) adaptive50@e4spro-cluster:~/Documents/buildtest/aws_oddc$ qstat -f 40680075.e4spro-cluster Job Id: 40680075.e4spro-cluster Job_Name = hostname_test Job_Owner = adaptive50@server.nodus.com resources_used.cput = 00:00:00 resources_used.vmem = 0kb resources_used.walltime = 00:00:05 resources_used.mem = 0kb resources_used.energy_used = 0 job_state = C queue = e4spro-cluster server = e4spro-cluster Checkpoint = u ctime = Mon Mar 25 17:42:02 2024 Error_Path = e4spro-cluster:/home/adaptive50/Documents/buildtest/var/tests /generic.torque.e4spro/sleep/hostname_test/b10fea47/stage/hostname_tes t.e exec_host = ac-d160-0-0/0 Hold_Types = n Join_Path = n Keep_Files = n Mail_Points = a mtime = Mon Mar 25 17:42:38 2024 Output_Path = e4spro-cluster:/home/adaptive50/Documents/buildtest/var/test s/generic.torque.e4spro/sleep/hostname_test/b10fea47/stage/hostname_te st.o Priority = 0 qtime = Mon Mar 25 17:42:02 2024 Rerunable = True Resource_List.nodes = 1 Resource_List.nodect = 1 Resource_List.walltime = 24:00:00 session_id = 1806 Variable_List = PBS_O_QUEUE=e4spro-cluster,PBS_O_HOME=/home/adaptive50, PBS_O_LOGNAME=adaptive50, PBS_O_PATH=/home/adaptive50/Documents/buildtest/bin:/home/adaptive50/ .local/share/virtualenvs/buildtest-hH765GEg/bin:/home/adaptive50/packa ges/bin:/usr/local/paraview-5.11.2/bin:/home/adaptive50/.local/bin:/us r/local/cuda/bin:/usr/local/julia/1.10.0/bin:/usr/local/go/bin:/usr/lo cal/libexec/osu-micro-benchmarks/mpi/startup:/usr/local/libexec/osu-mi cro-benchmarks/mpi/pt2pt:/usr/local/libexec/osu-micro-benchmarks/mpi/o ne-sided:/usr/local/libexec/osu-micro-benchmarks/mpi/collective:/opt/b ootstrap/view/bin:/home/adaptive50/packages/bin:/usr/local/paraview-5. 11.2/bin:/home/adaptive50/.local/bin:/usr/local/cuda/bin:/usr/local/ju lia/1.10.0/bin:/usr/local/go/bin:/usr/local/libexec/osu-micro-benchmar ks/mpi/startup:/usr/local/libexec/osu-micro-benchmarks/mpi/pt2pt:/usr/ local/libexec/osu-micro-benchmarks/mpi/one-sided:/usr/local/libexec/os u-micro-benchmarks/mpi/collective:/opt/bootstrap/view/bin:/home/adapti ve50/spack/bin:/home/adaptive50/packages/bin:/spack/bin:/usr/local/vis it/bin:/usr/local/paraview-5.11.2/bin:/home/adaptive50/.local/bin:/usr /local/cuda/bin:/usr/local/julia/1.10.0/bin:/usr/local/go/bin:/usr/loc al/libexec/osu-micro-benchmarks/mpi/startup:/usr/local/libexec/osu-mic ro-benchmarks/mpi/pt2pt:/usr/local/libexec/osu-micro-benchmarks/mpi/on e-sided:/usr/local/libexec/osu-micro-benchmarks/mpi/collective:/opt/bo otstrap/view/bin:/home/adaptive50/.local/bin:/home/adaptive50/bin:/opt /mvapich2-x/gnu11.1.0/mofed/aws/mpirun/bin:/usr/local/bin:/usr/local/s bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/ games:/usr/local/games:/snap/bin:/opt/mvapich2-x/gnu11.1.0/mofed/aws/m pirun/libexec/osu-micro-benchmarks/mpi/startup:/opt/mvapich2-x/gnu11.1 .0/mofed/aws/mpirun/libexec/osu-micro-benchmarks/mpi/one-sided:/opt/mv apich2-x/gnu11.1.0/mofed/aws/mpirun/libexec/osu-micro-benchmarks/mpi/c ollective:/opt/mvapich2-x/gnu11.1.0/mofed/aws/mpirun/libexec/osu-micro -benchmarks/mpi/pt2pt:/usr/local/cuda/bin:/usr/local/tau-2.33/x86_64/b in:/spack/opt/spack/linux-ubuntu20.04-x86_64/gcc-11.4.0/openjdk-11.0.2 0.1_1-qg3jd2dpwz6bwi455lcljdkiv5rifjmr/bin:/usr/local/cuda/bin:/usr/lo cal/tau-2.33/x86_64/bin:/spack/opt/spack/linux-ubuntu20.04-x86_64/gcc- 11.4.0/openjdk-11.0.20.1_1-qg3jd2dpwz6bwi455lcljdkiv5rifjmr/bin:/usr/l ocal/cuda/bin:/usr/local/tau-2.33/x86_64/bin:/spack/opt/spack/linux-ub untu20.04-x86_64/gcc-11.4.0/openjdk-11.0.20.1_1-qg3jd2dpwz6bwi455lcljd kiv5rifjmr/bin,PBS_O_MAIL=/var/mail/adaptive50, PBS_O_SHELL=/usr/bin/bash,PBS_O_LANG=C.UTF-8, PBS_O_WORKDIR=/home/adaptive50/Documents/buildtest/var/tests/generic. torque.e4spro/sleep/hostname_test/b10fea47/stage, PBS_O_HOST=e4spro-cluster,PBS_O_SERVER=e4spro-cluster euser = adaptive50 egroup = adaptive50 queue_type = E etime = Mon Mar 25 17:42:02 2024 exit_status = 0 submit_args = -q e4spro-cluster /home/adaptive50/Documents/buildtest/var/t ests/generic.torque.e4spro/sleep/hostname_test/b10fea47/stage/hostname _test.sh start_time = Mon Mar 25 17:42:32 2024 start_count = 1 fault_tolerant = False comp_time = Mon Mar 25 17:42:38 2024 job_radix = 0 total_runtime = 6.235349 submit_host = e4spro-cluster init_work_dir = /home/adaptive50/Documents/buildtest/var/tests/generic.tor que.e4spro/sleep/hostname_test/b10fea47/stage request_version = 1 req_information.task_count.0 = 1 req_information.lprocs.0 = 1 req_information.thread_usage_policy.0 = allowthreads req_information.hostlist.0 = ac-d160-0-0:ppn=1 req_information.task_usage.0.task.0.cpu_list = 0 req_information.task_usage.0.task.0.mem_list = 0 req_information.task_usage.0.task.0.cores = 0 req_information.task_usage.0.task.0.threads = 1 req_information.task_usage.0.task.0.host = ac-d160-0-0 copy_on_rerun = False