doc/html/topology.shtml - SchedMD/slurm - Git at Google

 <!--#include virtual="header.txt"-->

 <h1>Topology Guide</h1>

 <p>Slurm can be configured to support topology-aware resource
 allocation to optimize job performance.
 Slurm supports several modes of operation, one to optimize performance on
 systems with a three-dimensional torus interconnect and another for
 a hierarchical interconnect.
 The hierarchical mode of operation supports both fat-tree or dragonfly networks,
 using slightly different algorithms.</p>

 <p>Slurm's native mode of resource selection is to consider the nodes
 as a one-dimensional array.
 Jobs are allocated resources on a best-fit basis.
 For larger jobs, this minimizes the number of sets of consecutive nodes
 allocated to the job.</p>

 <h2 id="contents">Contents
 <a class="slurm_link" href="#topo_3d"></a>
 </h2>

 <ul>
 <li><a href="#topo_3d">Three-dimensional Topology</a></li>
 <li><a href="#hierarchical">Tree Topology (Hierarchical Networks)</a>
 <ul>
 <li><a href="#config_generators">Configuration Generators</a></li>
 </ul></li>
 <li><a href="#block">Block Topology</a>
 <ul>
 <li><a href="#block-limitations">Limitations</a></li>
 </ul></li>
 <li><a href="#env_vars">Environment Variables</a></li>
 <li><a href="#multi_topo">Multiple Topologies</a></li>
 </ul>

 <h2 id="topo_3d">Three-dimensional Topology
 <a class="slurm_link" href="#topo_3d"></a>
 </h2>

 <p>Some larger computers rely upon a three-dimensional torus interconnect.
 The Cray XT and XE systems also have three-dimensional
 torus interconnects, but do not require that jobs execute in adjacent nodes.
 On those systems, Slurm only needs to allocate resources to a job which
 are nearby on the network.
 Slurm accomplishes this using a
 <a href="http://en.wikipedia.org/wiki/Hilbert_curve">Hilbert curve</a>
 to map the nodes from a three-dimensional space into a one-dimensional
 space.
 Slurm's native best-fit algorithm is thus able to achieve a high degree
 of locality for jobs.</p>

 <h2 id="hierarchical">Tree Topology (Hierarchical Networks)
 <a class="slurm_link" href="#hierarchical"></a>
 </h2>

 <p>Slurm can also be configured to allocate resources to jobs on a
 hierarchical network to minimize network contention.
 The basic algorithm is to identify the lowest level switch in the
 hierarchy that can satisfy a job's request and then allocate resources
 on its underlying leaf switches using a best-fit algorithm.
 Use of this logic requires a configuration setting of
 <i>TopologyPlugin=topology/tree</i>.</p>

 <p>Note that slurm uses a best-fit algorithm on the currently
 available resources. This may result in an allocation with
 more than the optimum number of switches. The user can request
 a maximum number of leaf switches for the job as well as a
 maximum time willing to wait for that number using the <code>--switches</code>
 option with the salloc, sbatch and srun commands. The parameters can
 also be changed for pending jobs using the scontrol and squeue commands.</p>

 <p>At some point in the future Slurm code may be provided to
 gather network topology information directly.
 Now the network topology information must be included
 in a <i>topology.conf</i> configuration file as shown in the
 examples below.
 The first example describes a three level switch in which
 each switch has two children.
 Note that the <i>SwitchName</i> values are arbitrary and only
 used for bookkeeping purposes, but a name must be specified on
 each line.
 The leaf switch descriptions contain a <i>SwitchName</i> field
 plus a <i>Nodes</i> field to identify the nodes connected to the
 switch.
 Higher-level switch descriptions contain a <i>SwitchName</i> field
 plus a <i>Switches</i> field to identify the child switches.
 Slurm's hostlist expression parser is used, so the node and switch
 names need not be consecutive (e.g. "Nodes=tux[0-3,12,18-20]"
 and "Switches=s[0-2,4-8,12]" will parse fine).
 </p>

 <p>An optional LinkSpeed option can be used to indicate the
 relative performance of the link.
 The units used are arbitrary and this information is currently not used.
 It may be used in the future to optimize resource allocations.</p>

 <p>The first example shows what a topology would look like for an
 eight node cluster in which all switches have only two children as
 shown in the diagram (not a very realistic configuration, but
 useful for an example).</p>

 <pre>
 # topology.conf
 # Switch Configuration
 SwitchName=s0 Nodes=tux[0-1]
 SwitchName=s1 Nodes=tux[2-3]
 SwitchName=s2 Nodes=tux[4-5]
 SwitchName=s3 Nodes=tux[6-7]
 SwitchName=s4 Switches=s[0-1]
 SwitchName=s5 Switches=s[2-3]
 SwitchName=s6 Switches=s[4-5]
 </pre>
 <img src=topo_ex1.gif width=600>

 <p>The next example is for a network with two levels and
 each switch has four connections.</p>
 <pre>
 # topology.conf
 # Switch Configuration
 SwitchName=s0 Nodes=tux[0-3]   LinkSpeed=900
 SwitchName=s1 Nodes=tux[4-7]   LinkSpeed=900
 SwitchName=s2 Nodes=tux[8-11]  LinkSpeed=900
 SwitchName=s3 Nodes=tux[12-15] LinkSpeed=1800
 SwitchName=s4 Switches=s[0-3]  LinkSpeed=1800
 SwitchName=s5 Switches=s[0-3]  LinkSpeed=1800
 SwitchName=s6 Switches=s[0-3]  LinkSpeed=1800
 SwitchName=s7 Switches=s[0-3]  LinkSpeed=1800
 </pre>
 <img src=topo_ex2.gif width=600>

 <p>As a practical matter, listing every switch connection
 definitely results in a slower scheduling algorithm for Slurm
 to optimize job placement.
 The application performance may achieve little benefit from such optimization.
 Listing the leaf switches with their nodes plus one top level switch
 should result in good performance for both applications and Slurm.
 The previous example might be configured as follows:</p>
 <pre>
 # topology.conf
 # Switch Configuration
 SwitchName=s0 Nodes=tux[0-3]
 SwitchName=s1 Nodes=tux[4-7]
 SwitchName=s2 Nodes=tux[8-11]
 SwitchName=s3 Nodes=tux[12-15]
 SwitchName=s4 Switches=s[0-3]
 </pre>

 <p>Note that compute nodes on switches that lack a common parent switch can
 be used, but no job will span leaf switches without a common parent
 (unless the TopologyParam=TopoOptional option is used).
 For example, it is legal to remove the line "SwitchName=s4 Switches=s[0-3]"
 from the above topology.conf file.
 In that case, no job will span more than four compute nodes on any single leaf
 switch.
 This configuration can be useful if one wants to schedule multiple physical
 clusters as a single logical cluster under the control of a single slurmctld
 daemon.</p>

 <p>If you have nodes that are in separate networks and are associated with
 unique switches in your <b>topology.conf</b> file, it's possible that you
 could get in a situation where a job isn't able to run.  If a job requests
 nodes that are in the different networks, either by requesting the nodes
 directly or by requesting a feature, the job will fail because the requested
 nodes can't communicate with each other.  We recommend placing nodes in
 separate network segments in disjoint partitions.</p>

 <p>For systems with a dragonfly network, configure Slurm with
 <i>TopologyPlugin=topology/tree</i> plus <i>TopologyParam=dragonfly</i>.
 If a single job can not be entirely placed within a single network leaf
 switch, the job will be spread across as many leaf switches as possible
 in order to optimize the job's network bandwidth.</p>

 <p><b>NOTE</b>: When using the <i>topology/tree</i> plugin, Slurm identifies
 the network switches which provide the best fit for pending jobs. If nodes
 have a <i>Weight</i> defined, this will override the resource selection based
 on network topology.</p>

 <h3 id="config_generators">Configuration Generators
 <a class="slurm_link" href="#config_generators"></a></h3>

 <p>The following independently maintained tools may be useful in generating the
 <b>topology.conf</b> file for certain switch types:</p>

 <ul>
 <li>Infiniband switch - <b>slurmibtopology</b><br>
 <a href="https://github.com/OleHolmNielsen/Slurm_tools/tree/master/slurmibtopology">
 https://github.com/OleHolmNielsen/Slurm_tools/tree/master/slurmibtopology</a></li>
 <li>Omni-Path (OPA) switch - <b>opa2slurm</b><br>
 <a href="https://gitlab.com/jtfrey/opa2slurm">
 https://gitlab.com/jtfrey/opa2slurm</a></li>
 <li>AWS Elastic Fabric Adapter (EFA) - <b>ec2-topology</b><br>
 <a href="https://github.com/aws-samples/ec2-topology-aware-for-slurm">
 https://github.com/aws-samples/ec2-topology-aware-for-slurm</a></li>
 </ul>

 <h2 id="block">Block Topology<a class="slurm_link" href="#block"></a></h2>

 <p>Slurm can be configured to allocate resources to jobs within a strictly
 enforced, hierarchical block structure using
 <b>TopologyPlugin=topology/block</b>. The block topology prioritizes the
 placement of jobs to minimize fragmentation across the cluster, as opposed to
 the tree topology, which focuses on fitting jobs on the first available
 resources. Small jobs will still be able to use the available space in a block
 that is partially used.</p>

 <p>The block topology approach begins with "base blocks" (bblocks), which are
 fundamental, contiguous groups of nodes defined in
 <a href="topology.conf.html">topology.conf</a>.
 These base blocks can be combined with other adjacent base blocks to form
 "aggregated blocks". In turn, these higher-level blocks can be aggregated
 with other contiguous blocks of the same hierarchical level to construct
 progressively larger blocks. This hierarchical arrangement is designed to
 ensure optimized communication performance for jobs running within these blocks.
 The <b>BlockSizes</b> configuration parameter defines the specific, enforceable
 block sizes at each level of this hierarchy.</p>

 <p>The allocation algorithm operates as follows:</p>

 <ol>
 <li>Identify the smallest block level, as defined by <b>BlockSizes</b>, that can
 satisfy the job's resource request</li>
 <li>Select a suitable subset of "lower-level blocks" (llblocks) that are
 components of this chosen aggregating block</li>
 <li>Allocate resources from the underlying base blocks that constitute this
 selected subset of llblocks, employing a best-fit algorithm for the
 precise placement of the job.</li>
 </ol>

 <h3 id="block-limitations">Limitations
 <a class="slurm_link" href="#block-limitations"></a>
 </h3>

 <p>Since the block topology takes a different approach than the traditional tree
 topology, there are limitations that should be taken into consideration.</p>

 <ul>
 <li><b>Ranges of nodes</b><br>
 When using <code>-N</code>/<code>--nodes</code> to specify a range of acceptable
 node counts, the scheduler will have to evaluate each value of that range to
 find optimal placement on the available block(s). If using a range is necessary,
 the number of possible values should be kept as small as possible.</li>
 <li><b>Requesting specific nodes</b><br>
 Using <code>-w</code>/<code>--nodelist</code> to request a specific node or
 nodes can conflict with the block placement and is not currently supported. You
 can use <code>-x</code>/<code>--exclude</code> to prevent a job from
 being scheduled on certain nodes.
 </li>
 <li><b>Contiguous blocks</b><br>
 The scheduler will attempt to place jobs on blocks that are adjacent to each
 other in the block structure. You cannot currently request that a job be
 placed on non-adjacent blocks.</li>
 </ul>

 <h2 id="user_opts">User Options<a class="slurm_link" href="#user_opts"></a></h2>

 <p>For use with the <b>topology/tree</b> plugin, user can also specify the
 maximum number of leaf switches to be used for their job with the maximum time
 the job should wait for this optimized configuration. The syntax for this option
 is <code>--switches=count[@time]</code>.
 The system administrator can limit the maximum time that any job can
 wait for this optimized configuration using the <b>SchedulerParameters</b>
 configuration parameter with the
 <a href="slurm.conf.html#OPT_max_switch_wait=#">max_switch_wait</a> option.</p>

 <p>When <b>topology/tree</b> or <b>topology/block</b> is configured, hostlist
 functions may be used in place of or alongside regular hostlist expressions
 in commands or configuration files that interact with the slurmctld. Valid
 topology functions include:</p>

 <ul>
 <li><b>block{blockX}</b> and <b>switch{switchY}</b> - expand to all nodes in
 	the specified block/switch.</li>
 <li><b>blockwith{nodeX}</b> and <b>switchwith{nodeY}</b> - expand to all nodes
 	in the same block/switch as the specified node.</li>
 </ul>

 <p>For example:</p>

 <pre>
 scontrol update node=block{b1} state=resume
 sbatch --nodelist=blockwith{node0} -N 10 program
 PartitionName=Block10 Nodes=block{block10} ...
 </pre>

 See also the hostlist function <b>feature{myfeature}</b>
 <a href="slurm.conf.html#OPT_Features">here</a>.</p>

 <h2 id="env_vars">Environment Variables
 <a class="slurm_link" href="#env_vars"></a>
 </h2>

 <p>If the topology/tree plugin is used, two environment variables will be set
 to describe that job's network topology. Note that these environment variables
 will contain different data for the tasks launched on each node. Use of these
 environment variables is at the discretion of the user.</p>

 <p><b>SLURM_TOPOLOGY_ADDR</b>:
 The value will be set to the names network switches which may be involved in
 the job's communications from the system's top level switch down to the leaf
 switch and ending with node name. A period is used to separate each hardware
 component name.</p>

 <p><b>SLURM_TOPOLOGY_ADDR_PATTERN</b>:
 This is set only if the system has the topology/tree plugin configured.
 The value will be set component types listed in SLURM_TOPOLOGY_ADDR.
 Each component will be identified as either "switch" or "node".
 A period is used to separate each hardware component type.</p>


 <h2 id="multi_topo">Multiple Topologies
 <a class="slurm_link" href="#multi_topo"></a>
 </h2>

 <p>Slurm 25.05 introduced the ability to define multiple network topologies using the
 <a href="topology.yaml.html">topology.yaml</a> configuration file.

 Each partition can be configured to use a specific topology by specifying the
 <a href="slurm.conf.html#OPT_Topology_1">Topology</a>
 in its partition configuration line.
 The Slurm controller will use the selected topology to optimize resource
 allocation for jobs submitted to that partition.
 If no topology is explicitly specified for a partition,
 Slurm will default to the cluster_default topology.</p>

 <p style="text-align:center;">Last modified 31 July 2025</p>

 <!--#include virtual="footer.txt"-->
	<!--#include virtual="header.txt"-->

	<h1>Topology Guide</h1>

	<p>Slurm can be configured to support topology-aware resource
	allocation to optimize job performance.
	Slurm supports several modes of operation, one to optimize performance on
	systems with a three-dimensional torus interconnect and another for
	a hierarchical interconnect.
	The hierarchical mode of operation supports both fat-tree or dragonfly networks,
	using slightly different algorithms.</p>

	<p>Slurm's native mode of resource selection is to consider the nodes
	as a one-dimensional array.
	Jobs are allocated resources on a best-fit basis.
	For larger jobs, this minimizes the number of sets of consecutive nodes
	allocated to the job.</p>

	<h2 id="contents">Contents
	<a class="slurm_link" href="#topo_3d"></a>
	</h2>

	<ul>
	<li><a href="#topo_3d">Three-dimensional Topology</a></li>
	<li><a href="#hierarchical">Tree Topology (Hierarchical Networks)</a>
	<ul>
	<li><a href="#config_generators">Configuration Generators</a></li>
	</ul></li>
	<li><a href="#block">Block Topology</a>
	<ul>
	<li><a href="#block-limitations">Limitations</a></li>
	</ul></li>
	<li><a href="#env_vars">Environment Variables</a></li>
	<li><a href="#multi_topo">Multiple Topologies</a></li>
	</ul>

	<h2 id="topo_3d">Three-dimensional Topology
	<a class="slurm_link" href="#topo_3d"></a>
	</h2>

	<p>Some larger computers rely upon a three-dimensional torus interconnect.
	The Cray XT and XE systems also have three-dimensional
	torus interconnects, but do not require that jobs execute in adjacent nodes.
	On those systems, Slurm only needs to allocate resources to a job which
	are nearby on the network.
	Slurm accomplishes this using a
	<a href="http://en.wikipedia.org/wiki/Hilbert_curve">Hilbert curve</a>
	to map the nodes from a three-dimensional space into a one-dimensional
	space.
	Slurm's native best-fit algorithm is thus able to achieve a high degree
	of locality for jobs.</p>

	<h2 id="hierarchical">Tree Topology (Hierarchical Networks)
	<a class="slurm_link" href="#hierarchical"></a>
	</h2>

	<p>Slurm can also be configured to allocate resources to jobs on a
	hierarchical network to minimize network contention.
	The basic algorithm is to identify the lowest level switch in the
	hierarchy that can satisfy a job's request and then allocate resources
	on its underlying leaf switches using a best-fit algorithm.
	Use of this logic requires a configuration setting of
	<i>TopologyPlugin=topology/tree</i>.</p>

	<p>Note that slurm uses a best-fit algorithm on the currently
	available resources. This may result in an allocation with
	more than the optimum number of switches. The user can request
	a maximum number of leaf switches for the job as well as a
	maximum time willing to wait for that number using the <code>--switches</code>
	option with the salloc, sbatch and srun commands. The parameters can
	also be changed for pending jobs using the scontrol and squeue commands.</p>

	<p>At some point in the future Slurm code may be provided to
	gather network topology information directly.
	Now the network topology information must be included
	in a <i>topology.conf</i> configuration file as shown in the
	examples below.
	The first example describes a three level switch in which
	each switch has two children.
	Note that the <i>SwitchName</i> values are arbitrary and only
	used for bookkeeping purposes, but a name must be specified on
	each line.
	The leaf switch descriptions contain a <i>SwitchName</i> field
	plus a <i>Nodes</i> field to identify the nodes connected to the
	switch.
	Higher-level switch descriptions contain a <i>SwitchName</i> field
	plus a <i>Switches</i> field to identify the child switches.
	Slurm's hostlist expression parser is used, so the node and switch
	names need not be consecutive (e.g. "Nodes=tux[0-3,12,18-20]"
	and "Switches=s[0-2,4-8,12]" will parse fine).
	</p>

	<p>An optional LinkSpeed option can be used to indicate the
	relative performance of the link.
	The units used are arbitrary and this information is currently not used.
	It may be used in the future to optimize resource allocations.</p>

	<p>The first example shows what a topology would look like for an
	eight node cluster in which all switches have only two children as
	shown in the diagram (not a very realistic configuration, but
	useful for an example).</p>

	<pre>
	# topology.conf
	# Switch Configuration
	SwitchName=s0 Nodes=tux[0-1]
	SwitchName=s1 Nodes=tux[2-3]
	SwitchName=s2 Nodes=tux[4-5]
	SwitchName=s3 Nodes=tux[6-7]
	SwitchName=s4 Switches=s[0-1]
	SwitchName=s5 Switches=s[2-3]
	SwitchName=s6 Switches=s[4-5]
	</pre>
	<img src=topo_ex1.gif width=600>

	<p>The next example is for a network with two levels and
	each switch has four connections.</p>
	<pre>
	# topology.conf
	# Switch Configuration
	SwitchName=s0 Nodes=tux[0-3] LinkSpeed=900
	SwitchName=s1 Nodes=tux[4-7] LinkSpeed=900
	SwitchName=s2 Nodes=tux[8-11] LinkSpeed=900
	SwitchName=s3 Nodes=tux[12-15] LinkSpeed=1800
	SwitchName=s4 Switches=s[0-3] LinkSpeed=1800
	SwitchName=s5 Switches=s[0-3] LinkSpeed=1800
	SwitchName=s6 Switches=s[0-3] LinkSpeed=1800
	SwitchName=s7 Switches=s[0-3] LinkSpeed=1800
	</pre>
	<img src=topo_ex2.gif width=600>

	<p>As a practical matter, listing every switch connection
	definitely results in a slower scheduling algorithm for Slurm
	to optimize job placement.
	The application performance may achieve little benefit from such optimization.
	Listing the leaf switches with their nodes plus one top level switch
	should result in good performance for both applications and Slurm.
	The previous example might be configured as follows:</p>
	<pre>
	# topology.conf
	# Switch Configuration
	SwitchName=s0 Nodes=tux[0-3]
	SwitchName=s1 Nodes=tux[4-7]
	SwitchName=s2 Nodes=tux[8-11]
	SwitchName=s3 Nodes=tux[12-15]
	SwitchName=s4 Switches=s[0-3]
	</pre>

	<p>Note that compute nodes on switches that lack a common parent switch can
	be used, but no job will span leaf switches without a common parent
	(unless the TopologyParam=TopoOptional option is used).
	For example, it is legal to remove the line "SwitchName=s4 Switches=s[0-3]"
	from the above topology.conf file.
	In that case, no job will span more than four compute nodes on any single leaf
	switch.
	This configuration can be useful if one wants to schedule multiple physical
	clusters as a single logical cluster under the control of a single slurmctld
	daemon.</p>

	<p>If you have nodes that are in separate networks and are associated with
	unique switches in your <b>topology.conf</b> file, it's possible that you
	could get in a situation where a job isn't able to run. If a job requests
	nodes that are in the different networks, either by requesting the nodes
	directly or by requesting a feature, the job will fail because the requested
	nodes can't communicate with each other. We recommend placing nodes in
	separate network segments in disjoint partitions.</p>

	<p>For systems with a dragonfly network, configure Slurm with
	<i>TopologyPlugin=topology/tree</i> plus <i>TopologyParam=dragonfly</i>.
	If a single job can not be entirely placed within a single network leaf
	switch, the job will be spread across as many leaf switches as possible
	in order to optimize the job's network bandwidth.</p>

	<p><b>NOTE</b>: When using the <i>topology/tree</i> plugin, Slurm identifies
	the network switches which provide the best fit for pending jobs. If nodes
	have a <i>Weight</i> defined, this will override the resource selection based
	on network topology.</p>

	<h3 id="config_generators">Configuration Generators
	<a class="slurm_link" href="#config_generators"></a></h3>

	<p>The following independently maintained tools may be useful in generating the
	<b>topology.conf</b> file for certain switch types:</p>

	<ul>
	<li>Infiniband switch - <b>slurmibtopology</b><br>
	<a href="https://github.com/OleHolmNielsen/Slurm_tools/tree/master/slurmibtopology">
	https://github.com/OleHolmNielsen/Slurm_tools/tree/master/slurmibtopology</a></li>
	<li>Omni-Path (OPA) switch - <b>opa2slurm</b><br>
	<a href="https://gitlab.com/jtfrey/opa2slurm">
	https://gitlab.com/jtfrey/opa2slurm</a></li>
	<li>AWS Elastic Fabric Adapter (EFA) - <b>ec2-topology</b><br>
	<a href="https://github.com/aws-samples/ec2-topology-aware-for-slurm">
	https://github.com/aws-samples/ec2-topology-aware-for-slurm</a></li>
	</ul>

	<h2 id="block">Block Topology<a class="slurm_link" href="#block"></a></h2>

	<p>Slurm can be configured to allocate resources to jobs within a strictly
	enforced, hierarchical block structure using
	<b>TopologyPlugin=topology/block</b>. The block topology prioritizes the
	placement of jobs to minimize fragmentation across the cluster, as opposed to
	the tree topology, which focuses on fitting jobs on the first available
	resources. Small jobs will still be able to use the available space in a block
	that is partially used.</p>

	<p>The block topology approach begins with "base blocks" (bblocks), which are
	fundamental, contiguous groups of nodes defined in
	<a href="topology.conf.html">topology.conf</a>.
	These base blocks can be combined with other adjacent base blocks to form
	"aggregated blocks". In turn, these higher-level blocks can be aggregated
	with other contiguous blocks of the same hierarchical level to construct
	progressively larger blocks. This hierarchical arrangement is designed to
	ensure optimized communication performance for jobs running within these blocks.
	The <b>BlockSizes</b> configuration parameter defines the specific, enforceable
	block sizes at each level of this hierarchy.</p>

	<p>The allocation algorithm operates as follows:</p>

	<ol>
	<li>Identify the smallest block level, as defined by <b>BlockSizes</b>, that can
	satisfy the job's resource request</li>
	<li>Select a suitable subset of "lower-level blocks" (llblocks) that are
	components of this chosen aggregating block</li>
	<li>Allocate resources from the underlying base blocks that constitute this
	selected subset of llblocks, employing a best-fit algorithm for the
	precise placement of the job.</li>
	</ol>

	<h3 id="block-limitations">Limitations
	<a class="slurm_link" href="#block-limitations"></a>
	</h3>

	<p>Since the block topology takes a different approach than the traditional tree
	topology, there are limitations that should be taken into consideration.</p>

	<ul>
	<li><b>Ranges of nodes</b><br>
	When using <code>-N</code>/<code>--nodes</code> to specify a range of acceptable
	node counts, the scheduler will have to evaluate each value of that range to
	find optimal placement on the available block(s). If using a range is necessary,
	the number of possible values should be kept as small as possible.</li>
	<li><b>Requesting specific nodes</b><br>
	Using <code>-w</code>/<code>--nodelist</code> to request a specific node or
	nodes can conflict with the block placement and is not currently supported. You
	can use <code>-x</code>/<code>--exclude</code> to prevent a job from
	being scheduled on certain nodes.
	</li>
	<li><b>Contiguous blocks</b><br>
	The scheduler will attempt to place jobs on blocks that are adjacent to each
	other in the block structure. You cannot currently request that a job be
	placed on non-adjacent blocks.</li>
	</ul>

	<h2 id="user_opts">User Options<a class="slurm_link" href="#user_opts"></a></h2>

	<p>For use with the <b>topology/tree</b> plugin, user can also specify the
	maximum number of leaf switches to be used for their job with the maximum time
	the job should wait for this optimized configuration. The syntax for this option
	is <code>--switches=count[@time]</code>.
	The system administrator can limit the maximum time that any job can
	wait for this optimized configuration using the <b>SchedulerParameters</b>
	configuration parameter with the
	<a href="slurm.conf.html#OPT_max_switch_wait=#">max_switch_wait</a> option.</p>

	<p>When <b>topology/tree</b> or <b>topology/block</b> is configured, hostlist
	functions may be used in place of or alongside regular hostlist expressions
	in commands or configuration files that interact with the slurmctld. Valid
	topology functions include:</p>

	<ul>
	<li><b>block{blockX}</b> and <b>switch{switchY}</b> - expand to all nodes in
	the specified block/switch.</li>
	<li><b>blockwith{nodeX}</b> and <b>switchwith{nodeY}</b> - expand to all nodes
	in the same block/switch as the specified node.</li>
	</ul>

	<p>For example:</p>

	<pre>
	scontrol update node=block{b1} state=resume
	sbatch --nodelist=blockwith{node0} -N 10 program
	PartitionName=Block10 Nodes=block{block10} ...
	</pre>

	See also the hostlist function <b>feature{myfeature}</b>
	<a href="slurm.conf.html#OPT_Features">here</a>.</p>

	<h2 id="env_vars">Environment Variables
	<a class="slurm_link" href="#env_vars"></a>
	</h2>

	<p>If the topology/tree plugin is used, two environment variables will be set
	to describe that job's network topology. Note that these environment variables
	will contain different data for the tasks launched on each node. Use of these
	environment variables is at the discretion of the user.</p>

	<p><b>SLURM_TOPOLOGY_ADDR</b>:
	The value will be set to the names network switches which may be involved in
	the job's communications from the system's top level switch down to the leaf
	switch and ending with node name. A period is used to separate each hardware
	component name.</p>

	<p><b>SLURM_TOPOLOGY_ADDR_PATTERN</b>:
	This is set only if the system has the topology/tree plugin configured.
	The value will be set component types listed in SLURM_TOPOLOGY_ADDR.
	Each component will be identified as either "switch" or "node".
	A period is used to separate each hardware component type.</p>


	<h2 id="multi_topo">Multiple Topologies
	<a class="slurm_link" href="#multi_topo"></a>
	</h2>

	<p>Slurm 25.05 introduced the ability to define multiple network topologies using the
	<a href="topology.yaml.html">topology.yaml</a> configuration file.

	Each partition can be configured to use a specific topology by specifying the
	<a href="slurm.conf.html#OPT_Topology_1">Topology</a>
	in its partition configuration line.
	The Slurm controller will use the selected topology to optimize resource
	allocation for jobs submitted to that partition.
	If no topology is explicitly specified for a partition,
	Slurm will default to the cluster_default topology.</p>

	<p style="text-align:center;">Last modified 31 July 2025</p>

	<!--#include virtual="footer.txt"-->