blob: 9c819db90bc944c0ab7552c9269073f911cd84fb [file] [log] [blame]
<!--#include virtual="header.txt"-->
<h1><a name="top">Select Plugin Design Guide</a></h1>
<h2 id="overview">Overview<a class="slurm_link" href="#overview"></a></h2>
<p>The select plugin is responsible for selecting compute resources to be
allocated to a job, plus allocating and deallocating those resources.
The select plugin is aware of the systems topology, based upon data structures
established by the topology plugin. It can also over-subscribe resources to
support gang scheduling (time slicing of parallel jobs), if so configured.
Other architectures would rely upon the select/linear
or select/cons_tres plugins. The select/linear plugin allocates
whole nodes to jobs and is the simplest implementation.
The select/cons_tres plugin (<i>cons_tres</i> is an abbreviation for
<i>trackable resources</i>) can allocate individual sockets, cores, threads
or CPUs within a node. It also includes the ability to manage other generic
resources, such as GPUs.
The select/cons_tres plugin is slightly slower than
select/linear, but contains far more complex logic.</p>
<h2 id="mode">Mode of Operation<a class="slurm_link" href="#mode"></a></h2>
<p>The select/linear and select/cons_tres plugins have
similar modes of operation. The obvious difference is that data structures
in select/linear are node-centric, while those in
select/cons_tres contain information at a finer resolution (sockets, cores,
threads, or CPUs depending upon the SelectTypeParameters configuration
parameter). The description below is generic and applies to the above two
plugin implementations. Note that each of these plugins is able to manage
memory allocations. If you need to track other resources, such as GPUs,
you should use the select/cons_tres plugin.</p>
<p>Per node data structures include memory (configured and allocated),
GRES (configured and allocated, in a List data structure), plus a flag
indicating if the node has been allocated using an exclusive option (preventing
other jobs from being allocated resources on that same node). The other key
data structure is used to enforce the per-partition <i>OverSubscribe</i>
configuration parameter and tracks how many jobs have been allocated each
compute resource (e.g. CPU) in each
partition. This data structure is different between the plugins based upon
the resolution of the resource allocation (e.g. nodes or CPUs).</p>
<p>Most of the logic in the select plugin is dedicated to identifying resources
to be allocated to a new job. Input to that function includes: a pointer to the
new job, a bitmap identifying nodes which could be used, node counts (minimum,
maximum, and desired), a count of how many jobs of that partition the job can
share resources with, and a list of jobs which can be preempted to initiate the
new job. The first phase is to determine of all usable nodes, which nodes
would best satisfy the resource requirement. This consists of a best-fit
algorithm that groups nodes based upon network topology (if the topology/tree
plugin is configured) or based upon consecutive nodes (by default). Once the
best nodes are identified, resources are accumulated for the new job until its
resource requirements are satisfied.</p>
<p>If the job can not be started with currently available resources, the plugin
will attempt to identify jobs which can be preempted in order to initiate the
new job. A copy of the current system state will be created including details
about all resources and active jobs. Preemptable jobs will then be removed
from this simulated system state until the new job can be initiated. When
sufficient resources are available for the new job, the jobs actually needing
to be preempted for its initiation will be preempted (this may be a subset of
the jobs whose preemption is simulated).</p>
<p>Other functions exist to support suspending jobs, resuming jobs, terminating
jobs, shrinking job allocations, un/packing job state information,
un/packing node state information, etc. The operation of those functions is
relatively straightforward and not detailed here.</p>
<p style="text-align:center;">Last modified 29 January 2024</p>
<!--#include virtual="footer.txt"-->