blob: 7daf86906a35e39da4de46b2a36d346ebc8cc4cb [file] [log] [blame]
<!--#include virtual="header.txt"-->
<h1><a name="top">Dynamic Nodes</a></h1>
<h2 id="overview">Overview<a class="slurm_link" href="#overview"></a></h2>
<p>Starting in Slurm 22.05, nodes can be dynamically added and removed from
Slurm.
</p>
<h2 id="communications">Dynamic Node Communications
<a class="slurm_link" href="#communications"></a>
</h2>
<p>
For regular, non-dynamically created nodes, Slurm knows how to communicate with
nodes by reading in the slurm.conf. This is why it is important for a
non-dynamic setup that the slurm.conf is synchronized across the cluster. For
dynamically created nodes, The controller automatically grabs the node's
<b>NodeAddr</b> and <b>NodeHostname</b> for dynamic slurmd registrations. The
controller then passes the node addresses to the clients so that they
communicate, and even fanout, to other nodes.
</p>
<h2 id="config">Slurm Configuration
<a class="slurm_link" href="#config"></a>
</h2>
<p>
<dl>
<dt><b>MaxNodeCount=#</b>
<dd>
Set to the number of possible nodes that can be active in a system at a time.
See the slurm.conf <a href="slurm.conf.html#OPT_MaxNodeCount">man</a> page for
more details.
<dt><b>SelectType=select/cons_tres</b>
<dd>Dynamic nodes are only supported with cons_tres.
</dl>
</p>
<h3 id="partitions">Partition Assignment
<a class="slurm_link" href="#partitions"></a>
</h3>
<p>
Dynamic nodes can be automatically assigned to partitions at creation by using
the partition's nodes <a href="slurm.conf.html#OPT_Nodes_1">ALL</a> keyword or
<a href="slurm.conf.html#SECTION_NODESET-CONFIGURATION">NodeSets</a> and
specifying a feature on the nodes.
</p>
<p>
e.g.
<pre>
Nodeset=ns1 Feature=f1
Nodeset=ns2 Feature=f2
PartitionName=all Nodes=ALL Default=yes
PartitionName=dyn1 Nodes=ns1
PartitionName=dyn2 Nodes=ns2
PartitionName=dyn3 Nodes=ns1,ns2
</pre>
</a>
<h2 id="create">Creating Nodes
<a class="slurm_link" href="#create"></a>
</h2>
<p>
Nodes can be created two ways:
<ol>
<li>
<dl>
<dt><b>Dynamic slurmd registration</b>
<dd>
<p>Using the slurmd <a href="slurmd.html#OPT_-Z">-Z</a> and
<a href="slurmd.html#OPT_conf-<node-parameters>">--conf</a> options a slurmd
will register with the controller and will automatically be added to the system.
</p>
<p>
e.g.
<pre>
slurmd -Z --conf "RealMemory=80000 Gres=gpu:2 Feature=f1"
</pre>
</p>
</dl>
</li>
<li>
<dl>
<dt><b>scontrol create NodeName= ...</b>
<dd>
<p>Create nodes using scontrol by specifying the same <b>NodeName</b>
line that you would define in the slurm.conf. See slurm.conf
<a href="slurm.conf.html#SECTION_NODE-CONFIGURATION">man</a> page for node
options. Only <b>State=CLOUD</b> and <b>State=FUTURE</b> are supported. The
node configuration should match what the slurmd will register with
(e.g. slurmd -C) plus any additional attributes.
</p>
</p>
e.g.
<pre>
scontrol create NodeName=d[1-100] CPUs=16 Boards=1 SocketsPerBoard=1 CoresPerSocket=8 ThreadsPerCore=2 RealMemory=31848 Gres=gpu:2 Feature=f1 State=cloud
</pre>
</p>
</dl>
</li>
</ol>
</p>
<h2 id="delete">Deleting Nodes
<a class="slurm_link" href="#delete"></a>
</h2>
<p>
Nodes can be deleted using <b>scontrol delete nodename=&lt;nodelist&gt;</b>.
Only dynamic nodes that have no running jobs and that are not part of a
reservation can be deleted.
</p>
<h2 id="delete">Dynamic Nodes + Topology
<a class="slurm_link" href="#topology"></a>
</h2>
<p>
Nodes can be dynamically added and/or removed from topologies, defined in either
<i>topology.conf</i> or <i>topology.yaml</i>, by specifying the node's
<b>Topology</b> option and providing the list of topology names and units.
A <b>topology unit</b> is the block name or the name of a leaf switch.
Intermediate switch names -- ':' delimited -- can be provided and will be
created if needed (e.g. Topology=topo-tree:sw_root:s1:s2).
</p>
<p>
<b>NOTE</b>: When using topology.conf, the topology name is "default".
</p>
e.g.
<pre>
scontrol create NodeName=d[1-100] ... Topology=topo-switch:s1,topo-block:b1"
</pre>
<pre>
slurmd -Z --conf "... Topology=topo-switch:s1,topo-block:b1"
</pre>
<pre>
slurmd -Z --conf "... Topology=default:b1"
</pre>
<h2 id="limitations">Limitations
<a class="slurm_link" href="#limitations"></a>
</h2>
<p>
<ol>
<li>
Dynamic nodes are not sorted internally and when added to Slurm they will
potentially be alphabetically out of order internally &mdash; leading to
suboptimal job allocations if node names represent topology of the nodes.
</li>
</ol>
</p>
<p style="text-align:center;">Last modified 15 August 2025</p>
<!--#include virtual="footer.txt"-->