:py:mod:`buildtest.scheduler.pbs` ================================= .. py:module:: buildtest.scheduler.pbs Module Contents --------------- Classes ~~~~~~~ .. autoapisummary:: buildtest.scheduler.pbs.PBSJob buildtest.scheduler.pbs.TorqueJob Attributes ~~~~~~~~~~ .. autoapisummary:: buildtest.scheduler.pbs.logger .. py:data:: logger .. py:class:: PBSJob(jobID, sched_cmds) Bases: :py:obj:`buildtest.scheduler.job.Job` The PBSJob models a PBS Job with helper methods to retrieve job state, check if job is running/pending/suspended. We have methods to poll job state, gather job results upon completion and cancel job. .. py:method:: is_pending() Return ``True`` if job is pending. A pending job is in state ``Q``. .. py:method:: is_running() Return ``True`` if job is running. A completed job is in state ``R``. .. py:method:: is_complete() Return ``True`` if job is complete. A completed job is in state ``F``. .. py:method:: is_suspended() Return ``True`` if job is suspended which would be in one of these states ``H``, ``U``, ``S``. .. py:method:: success() This method determines if job was completed successfully and returns ``True`` if exit code is 0. According to https://help.altair.com/2021.1.3/PBS%20Professional/PBSAdminGuide2021.1.3.pdf section 14.9 Job Exit Status Codes we have the following - Exit Code: X < 0 - Job could not be executed - Exit Code: 0 <= X < 128 - Exit value of Shell or top-level process - Exit Code: X >= 128 - Job was killed by signal - Exit Code: X == 0 - Job executed was a successful .. py:method:: fail() Return ``True`` if their is a job failure which would be if exit code is not 0 .. py:method:: get_output_error_files() Fetch output and error files right after job submission. .. py:method:: is_output_ready() Check if the output and error file exists. .. py:method:: poll() This method will poll the PBS Job by running ``qstat -f `` which will retrieve the job details and extract data such as job state, exit code, output and error file. A typical output for the PBS job looks something like this .. code-block:: console (buildtest) adaptive50@e4spro-cluster:~/Documents/buildtest/aws_oddc$ qstat -f 40680075.e4spro-cluster Job Id: 40680075.e4spro-cluster Job_Name = hostname_test Job_Owner = adaptive50@server.nodus.com resources_used.cput = 00:00:00 resources_used.vmem = 0kb resources_used.walltime = 00:00:05 resources_used.mem = 0kb resources_used.energy_used = 0 job_state = C queue = e4spro-cluster server = e4spro-cluster Checkpoint = u ctime = Mon Mar 25 17:42:02 2024 Error_Path = e4spro-cluster:/home/adaptive50/Documents/buildtest/var/tests /generic.torque.e4spro/sleep/hostname_test/b10fea47/stage/hostname_tes t.e exec_host = ac-d160-0-0/0 Hold_Types = n Join_Path = n Keep_Files = n Mail_Points = a mtime = Mon Mar 25 17:42:38 2024 Output_Path = e4spro-cluster:/home/adaptive50/Documents/buildtest/var/test s/generic.torque.e4spro/sleep/hostname_test/b10fea47/stage/hostname_te st.o Priority = 0 qtime = Mon Mar 25 17:42:02 2024 Rerunable = True Resource_List.nodes = 1 Resource_List.nodect = 1 Resource_List.walltime = 24:00:00 session_id = 1806 Variable_List = PBS_O_QUEUE=e4spro-cluster,PBS_O_HOME=/home/adaptive50, PBS_O_LOGNAME=adaptive50, PBS_O_PATH=/home/adaptive50/Documents/buildtest/bin:/home/adaptive50/ .local/share/virtualenvs/buildtest-hH765GEg/bin:/home/adaptive50/packa ges/bin:/usr/local/paraview-5.11.2/bin:/home/adaptive50/.local/bin:/us r/local/cuda/bin:/usr/local/julia/1.10.0/bin:/usr/local/go/bin:/usr/lo cal/libexec/osu-micro-benchmarks/mpi/startup:/usr/local/libexec/osu-mi cro-benchmarks/mpi/pt2pt:/usr/local/libexec/osu-micro-benchmarks/mpi/o ne-sided:/usr/local/libexec/osu-micro-benchmarks/mpi/collective:/opt/b ootstrap/view/bin:/home/adaptive50/packages/bin:/usr/local/paraview-5. 11.2/bin:/home/adaptive50/.local/bin:/usr/local/cuda/bin:/usr/local/ju lia/1.10.0/bin:/usr/local/go/bin:/usr/local/libexec/osu-micro-benchmar ks/mpi/startup:/usr/local/libexec/osu-micro-benchmarks/mpi/pt2pt:/usr/ local/libexec/osu-micro-benchmarks/mpi/one-sided:/usr/local/libexec/os u-micro-benchmarks/mpi/collective:/opt/bootstrap/view/bin:/home/adapti ve50/spack/bin:/home/adaptive50/packages/bin:/spack/bin:/usr/local/vis it/bin:/usr/local/paraview-5.11.2/bin:/home/adaptive50/.local/bin:/usr /local/cuda/bin:/usr/local/julia/1.10.0/bin:/usr/local/go/bin:/usr/loc al/libexec/osu-micro-benchmarks/mpi/startup:/usr/local/libexec/osu-mic ro-benchmarks/mpi/pt2pt:/usr/local/libexec/osu-micro-benchmarks/mpi/on e-sided:/usr/local/libexec/osu-micro-benchmarks/mpi/collective:/opt/bo otstrap/view/bin:/home/adaptive50/.local/bin:/home/adaptive50/bin:/opt /mvapich2-x/gnu11.1.0/mofed/aws/mpirun/bin:/usr/local/bin:/usr/local/s bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/ games:/usr/local/games:/snap/bin:/opt/mvapich2-x/gnu11.1.0/mofed/aws/m pirun/libexec/osu-micro-benchmarks/mpi/startup:/opt/mvapich2-x/gnu11.1 .0/mofed/aws/mpirun/libexec/osu-micro-benchmarks/mpi/one-sided:/opt/mv apich2-x/gnu11.1.0/mofed/aws/mpirun/libexec/osu-micro-benchmarks/mpi/c ollective:/opt/mvapich2-x/gnu11.1.0/mofed/aws/mpirun/libexec/osu-micro -benchmarks/mpi/pt2pt:/usr/local/cuda/bin:/usr/local/tau-2.33/x86_64/b in:/spack/opt/spack/linux-ubuntu20.04-x86_64/gcc-11.4.0/openjdk-11.0.2 0.1_1-qg3jd2dpwz6bwi455lcljdkiv5rifjmr/bin:/usr/local/cuda/bin:/usr/lo cal/tau-2.33/x86_64/bin:/spack/opt/spack/linux-ubuntu20.04-x86_64/gcc- 11.4.0/openjdk-11.0.20.1_1-qg3jd2dpwz6bwi455lcljdkiv5rifjmr/bin:/usr/l ocal/cuda/bin:/usr/local/tau-2.33/x86_64/bin:/spack/opt/spack/linux-ub untu20.04-x86_64/gcc-11.4.0/openjdk-11.0.20.1_1-qg3jd2dpwz6bwi455lcljd kiv5rifjmr/bin,PBS_O_MAIL=/var/mail/adaptive50, PBS_O_SHELL=/usr/bin/bash,PBS_O_LANG=C.UTF-8, PBS_O_WORKDIR=/home/adaptive50/Documents/buildtest/var/tests/generic. torque.e4spro/sleep/hostname_test/b10fea47/stage, PBS_O_HOST=e4spro-cluster,PBS_O_SERVER=e4spro-cluster euser = adaptive50 egroup = adaptive50 queue_type = E etime = Mon Mar 25 17:42:02 2024 exit_status = 0 submit_args = -q e4spro-cluster /home/adaptive50/Documents/buildtest/var/t ests/generic.torque.e4spro/sleep/hostname_test/b10fea47/stage/hostname _test.sh start_time = Mon Mar 25 17:42:32 2024 start_count = 1 fault_tolerant = False comp_time = Mon Mar 25 17:42:38 2024 job_radix = 0 total_runtime = 6.235349 submit_host = e4spro-cluster init_work_dir = /home/adaptive50/Documents/buildtest/var/tests/generic.tor que.e4spro/sleep/hostname_test/b10fea47/stage request_version = 1 req_information.task_count.0 = 1 req_information.lprocs.0 = 1 req_information.thread_usage_policy.0 = allowthreads req_information.hostlist.0 = ac-d160-0-0:ppn=1 req_information.task_usage.0.task.0.cpu_list = 0 req_information.task_usage.0.task.0.mem_list = 0 req_information.task_usage.0.task.0.cores = 0 req_information.task_usage.0.task.0.threads = 1 req_information.task_usage.0.task.0.host = ac-d160-0-0 copy_on_rerun = False .. py:method:: retrieve_jobdata() This method is called once job is complete. We will gather record of job by running ``qstat -x -f -F json `` and return the json object as a dict. This method is responsible for getting output file, error file and exit status of job. .. py:method:: cancel() Cancel PBS job by running ``qdel ``. .. py:class:: TorqueJob(jobID, sched_cmds) Bases: :py:obj:`PBSJob` The PBSJob models a PBS Job with helper methods to retrieve job state, check if job is running/pending/suspended. We have methods to poll job state, gather job results upon completion and cancel job. .. py:method:: retrieve_jobdata() This method is called once job is complete. We will gather record of job by running `qstat -f ` and return the output as a string.