:py:mod:`buildtest.scheduler.detection` ======================================= .. py:module:: buildtest.scheduler.detection Module Contents --------------- Classes ~~~~~~~ .. autoapisummary:: buildtest.scheduler.detection.Scheduler buildtest.scheduler.detection.Slurm buildtest.scheduler.detection.LSF buildtest.scheduler.detection.PBS buildtest.scheduler.detection.Torque .. py:class:: Scheduler(custom_dirs=None) This is a base Scheduler class used for implementing common methods for detecting Scheduler details. The subclass implements specific queries that are scheduler specific. .. py:attribute:: logger .. py:attribute:: binaries :value: [] .. py:method:: queues() .. py:method:: active() Returns ``True`` if buildtest is able to retrieve queues from Scheduler otherwises returns ``False`` .. py:method:: get_queues() :abstractmethod: This method is implemented by subclass to return a list of queues for a given scheduler .. py:class:: Slurm(custom_dirs=None) Bases: :py:obj:`Scheduler` The Slurm class implements common functions to query Slurm cluster including partitions, qos, cluster. We check existence of slurm binaries in $PATH and return if slurm cluster is in valid state. .. py:attribute:: binaries :value: ['sbatch', 'sacct', 'sacctmgr', 'sinfo', 'scancel', 'scontrol'] .. py:method:: partitions() .. py:method:: clusters() .. py:method:: qos() .. py:method:: run_command(query) Run a command and return output as list of lines .. py:method:: _get_partitions() Get list of all partitions slurm partitions using ``sinfo -a -h -O partitionname``. The output is a list of queue names .. code-block:: console $ sinfo -a -h -O partitionname system system_shared debug_hsw debug_knl jupyter .. py:method:: _get_clusters() Get list of slurm clusters by running ``sacctmgr list cluster -P -n format=Cluster``. The output is a list of slurm clusters something as follows .. code-block:: console $ sacctmgr list cluster -P -n format=Cluster cori escori .. py:method:: _get_qos() Retrieve a list of all slurm qos by running ``sacctmgr list qos -P -n format=Name``. The output is a list of qos. Shown below is an example output .. code-block:: console $ sacctmgr list qos -P -n format=Name normal premium low serialize scavenger .. py:method:: validate_partition(slurm_executor) Validate the partition for a given executor. :param slurm_executor: The configuration of the executor. :type slurm_executor: dict :returns: True if the partition is valid and in 'up' state, False otherwise. :rtype: bool .. py:method:: validate_cluster(executor, slurm_executor) This method will validate a cluster for a given executor. If 'cluster' key is defined in slurm executor configuration we will check if cluster is valid, if so we return True otherwise we return False. :param executor: The name of the executor. :type executor: str :param slurm_executor: The configuration of the executor. :type slurm_executor: dict .. py:method:: validate_qos(executor, slurm_executor) This method will validate a qos for a given executor. If 'qos' key is defined in slurm executor configuration we will check if qos is valid, if so we return True otherwise we return False. :param executor: The name of the executor. :type executor: str :param slurm_executor: The configuration of the executor. :type slurm_executor: dict .. py:class:: LSF(custom_dirs=None) Bases: :py:obj:`Scheduler` The LSF class checks for LSF binaries and returns a list of LSF queues .. py:attribute:: binaries :value: ['bsub', 'bqueues', 'bkill', 'bjobs'] .. py:method:: get_queues() Return json dictionary of available LSF Queues and their queue states. The command we run is the following: ``bqueues -o 'queue_name status' -json`` which returns a JSON record of all queue details. .. code-block:: console $ bqueues -o 'queue_name status' -json { "COMMAND":"bqueues", "QUEUES":2, "RECORDS":[ { "QUEUE_NAME":"batch", "STATUS":"Open:Active" }, { "QUEUE_NAME":"test", "STATUS":"Open:Active" } ] } .. py:method:: validate_queue(executor) This method will validate a LSF queue. We check if queue is available and in 'Open:Active' state. The input is a dictionary containing the LSF executor configuration. If queue is not found we return False. :param executor: The dictionary containing the LSF executor configuration. :type executor: dict :returns: True if queue is found and in 'Open:Active' state, False otherwise. :rtype: bool .. py:class:: PBS(custom_dirs=None) Bases: :py:obj:`Scheduler` The PBS class checks for PBS binaries and gets a list of available queues .. py:attribute:: binaries :value: ['qsub', 'qstat', 'qdel', 'qhold', 'qmgr'] .. py:method:: active() Return True if PBS Scheduler is detected otherwise return False .. py:method:: check(custom_dirs=None) Check if binaries exist in $PATH and run ``qsub --version`` to see output to determine if its OpenPBS scheduler. The return will be a boolean type where ``True`` indicates the check has passed. Output of ``qsub --version`` from OpenPBS scheduler would be as follows, we will search for string `pbs_version` [pbsuser@pbs tmp]$ qsub --version pbs_version = 19.0.0 :param binaries: list of binaries to check for existence in $PATH :type binaries: list .. py:method:: get_queues() Get queue configuration using ``qstat -Q -f -F json`` and retrieve a list of queues. Shown below is an example output of ``qstat -Q -f -F json`` .. code-block:: console $ qstat -Q -f -F json { "timestamp":1615924938, "pbs_version":"19.0.0", "pbs_server":"pbs", "Queue":{ "workq":{ "queue_type":"Execution", "total_jobs":0, "state_count":"Transit:0 Queued:0 Held:0 Waiting:0 Running:0 Exiting:0 Begun:0 ", "resources_assigned":{ "mem":"0kb", "ncpus":0, "nodect":0 }, "hasnodes":"True", "enabled":"True", "started":"True" } } } .. py:method:: validate_queue(queue_name) Validate a PBS queue. Return True if queue exists and is enabled and started, False otherwise. :param queue_name: The name of the queue to validate. :type queue_name: str .. py:class:: Torque(custom_dirs=None) Bases: :py:obj:`Scheduler` The Torque class for detecting Torque Scheduler and getting list of queues. .. py:attribute:: binaries :value: ['qsub', 'qstat', 'qdel', 'qhold', 'qmgr'] .. py:method:: active() Return True if Torque Scheduler is detected otherwise return False .. py:method:: check(custom_dirs=None) Check if binaries exist in $PATH and run ``qsub --version`` to see output if its Torque Scheduler. The return will be a boolean type where ``True`` indicates the check has passed. Output from ``qsub --version`` from Torque scheduler would be as follows, we will search for `Commit:` in output to distinguish Torque from OpenPBS .. code-block:: console $ qsub --version Version: 7.0.1 Commit: b405f8c22d41d29cbf9b9016bc1146bf4559e895 :param binaries: list of binaries to check for existence in $PATH :type binaries: list .. py:method:: get_queues() Get queue configuration using 'qstat -Qf' and parse the output into a JSON dictionary. The output of this command will be as follows .. code-block:: console $ qstat -Qf Queue: lbl-cluster queue_type = Execution total_jobs = 0 state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0 Exiting:0 Complete:0 resources_default.nodes = 1 resources_default.walltime = 24:00:00 mtime = 1711400391 enabled = True started = True .. py:method:: validate_queue(torque_executor) This method will validate queue for a given executor. We will check if queue is available and check queue configuration to see if queue is enabled and started properly.