doc/html/intel_knl.shtml - SchedMD/slurm - Git at Google

 <!--#include virtual="header.txt"-->

 <h1>Intel Knights Landing (KNL) User and Administrator Guide</h1>

 <h2 id="overview">Overview<a class="slurm_link" href="#overview"></a></h2>

 <p>This document describes the unique features of Slurm on the computers with
 the Intel Knights Landing processor.
 You should be familiar with the Slurm's mode of operation on Linux clusters
 before studying the relatively few differences in Intel KNL system operation
 described in this document.</p>

 <h2 id="user_tools">User Tools
 <a class="slurm_link" href="#user_tools"></a>
 </h2>

 <p>The desired NUMA and MCDRAM modes for a KNL processor should be specified
 using the -C or --constraint option of Slurm's job submission commands: salloc,
 sbatch, and srun. Currently available NUMA and MCDRAM modes are shown in the
 table below. Each node's available and current NUMA and MCDRAM modes are
 visible in the "available features" and "active features" fields respectively,
 which may be seen using the scontrol, sinfo, or sview commands.
 Note that a node may need to be rebooted to get the desired NUMA and MCDRAM
 modes and nodes may only be rebooted when they contain no running jobs
 (i.e. sufficient resources may be available to run a pending job, but until
 the node is idle and can be rebooted, the pending job may not be allocated
 resources). Also note that the job will be charged for resources from the time
 of resource allocation, which may include time to reboot a node into the
 desired NUMA and MCDRAM configuration.</p>

 <p>Slurm supports a very rich set of options for the node constraint options
 (exclusive OR, node counts for each constraint, etc.).
 See the man pages for the salloc, sbatch and srun commands for more information
 about the constraint syntax.
 Jobs may specify their desired NUMA and/or MCDRAM configuration. If no
 NUMA and/or MCDRAM configuration is specified, then a node with any possible
 value for that configuration will be used.</p>

 <table width="100%" border=1 cellspacing=0 cellpadding=4>
 <tr>
   <th width="15%">Type</th>
   <th width="15%">Name</th>
   <th width="70%">Description</th>
 </tr>
 <tr><td>MCDRAM</td><td>cache</td><td>All of MCDRAM to be used as cache</td></tr>
 <tr><td>MCDRAM</td><td>equal</td><td>MCDRAM to be used partly as cache and partly combined with primary memory</td></tr>
 <tr><td>MCDRAM</td><td>flat</td><td>MCDRAM to be combined with primary memory into a "flat" memory space</td></tr>
 <tr><td>NUMA</td><td>a2a</td><td>All to all</td></tr>
 <tr><td>NUMA</td><td>hemi</td><td>Hemisphere</td></tr>
 <tr><td>NUMA</td><td>snc2</td><td>Sub-NUMA cluster 2</td></tr>
 <tr><td>NUMA</td><td>snc4</td><td>Sub-NUMA cluster 4 (<a href="#note">NOTE</a>)</td></tr>
 <tr><td>NUMA</td><td>quad</td><td>Quadrant (<a href="#note2">NOTE</a>)</td></tr>
 </table>

 <p>Jobs requiring some or all of the KNL high bandwidth memory (HBM) should
 explicitly request that memory using Slurm's Generic RESource (GRES) options.
 The HBM will always be known by Slurm GRES name of "hbm".
 Examples below demonstrate use of HBM.</p>

 <p>Sorting of the free cache pages at step startup using Intel's zonesort
 module can be configured as the default for all steps using the
 "LaunchParameters=mem_sort" option in the slurm.conf file.
 Individual job steps can enable or disable sorting using the "--mem-bind=sort"
 or "--mem-bind=nosort" command line options for srun.
 Sorting will be performed only for the NUMA nodes allocated to the job step.</p>

 <p><a id="note"><b>NOTE</b></a>: Slurm provides limited support
 for restricting use of HBM. At some point in the future, the amount of HBM
 requested by the job will be compared with the total amount of HBM and number of
 memory-containing NUMA nodes available on the KNL processor. The job will then
 be bound to specific NUMA nodes in order to limit the total amount of HBM
 available to the job, and thus reserve the remaining HBM for other jobs running
 on that KNL processor.</p>

 <p><a id="note2"><b>NOTE</b></a>: Slurm can only
 support homogeneous nodes (e.g. the same number of cores per NUMA node).
 KNL processors with <u>68 cores</u> (a subset of KNL models) will not have
 homogeneous NUMA nodes in snc4 mode, but each NUMA node will have
 either 16 or 18 cores. This will result in Slurm using the lower core count,
 finding a total of 256 threads rather than 272 threads and setting the node
 to a DOWN state.</p>

 <h3 id="accounting">Accounting<a class="slurm_link" href="#accounting"></a></h3>

 <p>If a node requires rebooting for a job's required configuration, the job
 will be charged for the resource allocation from the time of allocation through
 the lifetime of the job, including the time consumed for booting the nodes.
 The job's time limit will be calculated from the time that all nodes are ready
 for use.
 For example, a job with a 10 minute time limit may be allocated resources at
 10:00:00.
 If the nodes require rebooting, they might not be available for use until
 10:20:00, 20 minutes after allocation, and the job will begin execution at
 that time.
 The job must complete no later than 10:30:00 in order to satisfy its time limit
 (10 minutes after execution actually begins).
 However, the job will be charged for 30 minutes of resource use, which includes
 the boot time.</p>

 <h3 id="use_case">Sample Use Cases
 <a class="slurm_link" href="#use_case"></a>
 </h3>

 <pre>
 $ sbatch -C flat,a2a -N2 --gres=hbm:8g --exclusive my.script
 $ srun --constraint=hemi,cache -n36 a.out
 $ srun --constraint=flat --gres=hbm:2g -n36 a.out

 $ sinfo -o "%30N %20b %f"
 NODELIST       ACTIVE_FEATURES  AVAIL_FEATURES
 nid000[10-11]
 nid000[12-35]  flat,a2a         flat,a2a,snc2,hemi
 nid000[36-43]  cache,a2a        flat,equal,cache,a2a,hemi
 </pre>

 <h3 id="topology">Network Topology
 <a class="slurm_link" href="#topology"></a>
 </h3>

 <p>Slurm will optimize performance using those resources available without
 rebooting. If node rebooting is required, then it will optimize layout with
 respect to network bandwidth using both nodes currently in the desired
 configuration and those which can be made available after rebooting.
 This can result in more nodes being rebooted than strictly needed, but will
 improve application performance.</p>

 <p>Users can specify they want all resources allocated on a specific count of
 leaf switches (Dragonfly group) using Slurm's <b>--switches</b> option.
 They can also specify how much additional time they are willing to wait for
 such a configuration. If the desired configuration can not be made available
 within the specified time interval, the job will be allocated nodes optimized
 with respect to network bandwidth to the extent possible. On a Dragonfly
 network, this means allocating resources over either single group or
 distributed evenly over as many groups as possible. For example:</p>
 <pre>
 srun --switches=1@10:00 N16 a.out
 </pre>
 <p>Note that system administrators can disable use of the <b>--switches</b>
 option or limit the amount of time the job can be deferred using the
 <b>SchedulerParameters</b> <b>max-switch-wait</b> option.</p>

 <h3 id="boot_problems">Booting Problems
 <a class="slurm_link" href="#boot_problems"></a>
 </h3>

 <p>If node boots fail, those nodes are drained and the job is requeued so that
 it can be allocated a different set of nodes. The nodes originally allocated
 to the job will remain available to the job, so likely a small number of
 additional nodes will be required.</p>

 <h2 id="administration">System Administration
 <a class="slurm_link" href="#administration"></a>
 </h2>

 <p>Four important components are required to use Slurm on an Intel KNL system.</p>
 <ol>
 <li>Slurm needs a mechanism to determine the node's current topology (e.g.
 how many NUMA exist and which cores are associated with each NUMA). Slurm
 relies upon <a href="http://www.open-mpi.org/projects/hwloc/">
 Portable Hardware Locality (HWLOC)</a> for this functionality. Please install
 HWLOC before building Slurm.</li>

 <li>The node features plugin manages the available and active features
 information available for each KNL node.</li>

 <li>A configuration file is used to define various timeouts, default
 configuration, etc. The configuration file name and contents will depend upon
 the node features plugins used. See the <a href="knl.conf.html">knl.conf</a>
 man page for more information.</li>

 <li>A mechanism is required to boot nodes in the desired configuration. This
 mechanism must be integrated with existing Slurm infrastructure for
 <a href="sbatch.html">rebooting nodes on user request (--reboot)</a>.</li>
 </ol>

 <p>In addition, there is a DebugFlags option of "NodeFeatures" which will
 generate detailed information about KNL operations.</p>

 <p>The KNL-specific available and active features are configured differently
 based upon the plugin configured.<br>
 <u>For the knl_generic plugin</u>, KNL-specific features should be defined
 in the "slurm.conf" configuration file. When the slurmd daemon starts on each
 compute node, it will update the available and active features as needed.<br>
 Features which are not KNL-specific (e.g. rack number, "knl", etc.) will be
 copied from the node's "Features" configuration in "slurm.conf" to both the
 available and active feature fields and not modified by the NodeFeatures
 plugin.</p>

 <p><b>NOTE</b>: For Dell KNL systems you must also include the
 <i>SystemType=Dell</i> option for successful operation and will likely need to
 increase the <i>SyscfgTimeout</i> to allow enough time for the command to
 successfully complete.  Experience at one site has shown that a 10 second
 timeout may be necessary, configured as <i>SyscfgTimeout=10000</i>.</p>

 <p>Slurm does not support the concept of multiple NUMA nodes
 within a single socket. If a KNL node is booted with multiple NUMA, then each
 NUMA node will appear in Slurm as a separate socket.
 In the slurm.conf configuration file, set node socket and
 core counts to values which are appropriate for some NUMA mode to be used on the
 node. When the node boots and the slurmd daemon on the node starts, it will
 report to the slurmctld daemon the node's actual socket (NUMA) and core counts,
 which will update Slurm data structures for the node to the values which are
 currently configured.
 Note that Slurm currently does not support the concept of
 differing numbers of cores in each socket (or NUMA node). We are currently
 working to address these issues.</p>

 <h3 id="operation">Mode of Operation
 <a class="slurm_link" href="#operation"></a>
 </h3>

 <ol>
 <li>The node's configured "Features" are copied to the available and active
 feature fields.</li>
 <li>The node features plugin determines the node's current MCDRAM and NUMA
 values as well as those which are available and adds those values to the node's
 active and available feature fields respectively. Note that these values may
 not be available until the node has booted and the slurmd daemon on the
 compute node sends that information to the slurmctld daemon.</li>
 <li>Jobs will be allocated nodes already in the requested MCDRAM and NUMA mode
 if possible. If insufficient resources are available with the requested
 configuration then other nodes will be selected and booted into the desired
 configuration once no other jobs are active on the node. Until a node is idle,
 its configuration can not be changed. Note that node reboot time is roughly
 on the order of 20 minutes.</li>
 </ol>

 <h3 id="config">Generic Cluster Configuration
 <a class="slurm_link" href="#config"></a>
 </h3>

 <p>All other clusters should have NodeFeaturesPlugins configured to "knl_generic".
 This plugin performs all operations directly on the compute nodes using Intel's
 <i>syscfg</i> program to get and modify the node's MCDRAM and NUMA mode and
 uses the Linux <i>reboot</i> program to reboot the compute node in order for
 modifications in MCDRAM and/or NUMA mode to take effect.
 Make sure that RebootProgram is defined in the slurm.conf file.
 This plugin currently does <u>not</u> permit the specification of ResumeProgram,
 SuspendProgram, SuspendTime, etc. in slurm.conf, however that limitation may
 be removed in the future (the ResumeProgram currently has no means of changing
 the node's MCDRAM and/or NUMA mode prior to reboot).</p>

 <p><b>NOTE</b>: The syscfg program reports the MCDRAM and NUMA mode to be used
 when the node is next booted. If the syscfg program is used to modify the MCDRAM
 or NUMA mode of a node, but it is not rebooted, then Slurm will be making
 scheduling decisions based upon incorrect state information. If you want to
 change node state information outside of Slurm then use the following procedure:
 <ol>
 <li>Drain the nodes of interest</li>
 <li>Change their MCDRAM and/or NUMA mode</li>
 <li>Reboot the nodes, then</li>
 <li>Restore them to service in Slurm</li>
 </ol>
 </p>

 <h4>Sample knl_generic.conf File</h4>

 <pre>
 # Sample knl_generic.conf
 SyscfgPath=/usr/bin/syscfg
 DefaultNUMA=a2a         # NUMA=all2all
 AllowNUMA=a2a,snc2,hemi
 DefaultMCDRAM=cache     # MCDRAM=cache
 </pre>

 <h4>Sample slurm.conf File</h4>

 <pre>
 # Sample slurm.conf
 NodeFeaturesPlugins=knl_generic
 DebugFlags=NodeFeatures
 GresTypes=hbm
 RebootProgram=/sbin/reboot
 ...
 Nodename=default Sockets=1 CoresPerSocket=68 ThreadsPerCore=4 RealMemory=128000 Feature=knl
 NodeName=nid[00000-00127] State=UNKNOWN
 </pre>


 <p style="text-align:center;">Last modified 13 March 2024</p>

 <!--#include virtual="footer.txt"-->
	<!--#include virtual="header.txt"-->

	<h1>Intel Knights Landing (KNL) User and Administrator Guide</h1>

	<h2 id="overview">Overview<a class="slurm_link" href="#overview"></a></h2>

	<p>This document describes the unique features of Slurm on the computers with
	the Intel Knights Landing processor.
	You should be familiar with the Slurm's mode of operation on Linux clusters
	before studying the relatively few differences in Intel KNL system operation
	described in this document.</p>

	<h2 id="user_tools">User Tools
	<a class="slurm_link" href="#user_tools"></a>
	</h2>

	<p>The desired NUMA and MCDRAM modes for a KNL processor should be specified
	using the -C or --constraint option of Slurm's job submission commands: salloc,
	sbatch, and srun. Currently available NUMA and MCDRAM modes are shown in the
	table below. Each node's available and current NUMA and MCDRAM modes are
	visible in the "available features" and "active features" fields respectively,
	which may be seen using the scontrol, sinfo, or sview commands.
	Note that a node may need to be rebooted to get the desired NUMA and MCDRAM
	modes and nodes may only be rebooted when they contain no running jobs
	(i.e. sufficient resources may be available to run a pending job, but until
	the node is idle and can be rebooted, the pending job may not be allocated
	resources). Also note that the job will be charged for resources from the time
	of resource allocation, which may include time to reboot a node into the
	desired NUMA and MCDRAM configuration.</p>

	<p>Slurm supports a very rich set of options for the node constraint options
	(exclusive OR, node counts for each constraint, etc.).
	See the man pages for the salloc, sbatch and srun commands for more information
	about the constraint syntax.
	Jobs may specify their desired NUMA and/or MCDRAM configuration. If no
	NUMA and/or MCDRAM configuration is specified, then a node with any possible
	value for that configuration will be used.</p>

	<table width="100%" border=1 cellspacing=0 cellpadding=4>
	<tr>
	<th width="15%">Type</th>
	<th width="15%">Name</th>
	<th width="70%">Description</th>
	</tr>
	<tr><td>MCDRAM</td><td>cache</td><td>All of MCDRAM to be used as cache</td></tr>
	<tr><td>MCDRAM</td><td>equal</td><td>MCDRAM to be used partly as cache and partly combined with primary memory</td></tr>
	<tr><td>MCDRAM</td><td>flat</td><td>MCDRAM to be combined with primary memory into a "flat" memory space</td></tr>
	<tr><td>NUMA</td><td>a2a</td><td>All to all</td></tr>
	<tr><td>NUMA</td><td>hemi</td><td>Hemisphere</td></tr>
	<tr><td>NUMA</td><td>snc2</td><td>Sub-NUMA cluster 2</td></tr>
	<tr><td>NUMA</td><td>snc4</td><td>Sub-NUMA cluster 4 (<a href="#note">NOTE</a>)</td></tr>
	<tr><td>NUMA</td><td>quad</td><td>Quadrant (<a href="#note2">NOTE</a>)</td></tr>
	</table>

	<p>Jobs requiring some or all of the KNL high bandwidth memory (HBM) should
	explicitly request that memory using Slurm's Generic RESource (GRES) options.
	The HBM will always be known by Slurm GRES name of "hbm".
	Examples below demonstrate use of HBM.</p>

	<p>Sorting of the free cache pages at step startup using Intel's zonesort
	module can be configured as the default for all steps using the
	"LaunchParameters=mem_sort" option in the slurm.conf file.
	Individual job steps can enable or disable sorting using the "--mem-bind=sort"
	or "--mem-bind=nosort" command line options for srun.
	Sorting will be performed only for the NUMA nodes allocated to the job step.</p>

	<p><a id="note"><b>NOTE</b></a>: Slurm provides limited support
	for restricting use of HBM. At some point in the future, the amount of HBM
	requested by the job will be compared with the total amount of HBM and number of
	memory-containing NUMA nodes available on the KNL processor. The job will then
	be bound to specific NUMA nodes in order to limit the total amount of HBM
	available to the job, and thus reserve the remaining HBM for other jobs running
	on that KNL processor.</p>

	<p><a id="note2"><b>NOTE</b></a>: Slurm can only
	support homogeneous nodes (e.g. the same number of cores per NUMA node).
	KNL processors with <u>68 cores</u> (a subset of KNL models) will not have
	homogeneous NUMA nodes in snc4 mode, but each NUMA node will have
	either 16 or 18 cores. This will result in Slurm using the lower core count,
	finding a total of 256 threads rather than 272 threads and setting the node
	to a DOWN state.</p>

	<h3 id="accounting">Accounting<a class="slurm_link" href="#accounting"></a></h3>

	<p>If a node requires rebooting for a job's required configuration, the job
	will be charged for the resource allocation from the time of allocation through
	the lifetime of the job, including the time consumed for booting the nodes.
	The job's time limit will be calculated from the time that all nodes are ready
	for use.
	For example, a job with a 10 minute time limit may be allocated resources at
	10:00:00.
	If the nodes require rebooting, they might not be available for use until
	10:20:00, 20 minutes after allocation, and the job will begin execution at
	that time.
	The job must complete no later than 10:30:00 in order to satisfy its time limit
	(10 minutes after execution actually begins).
	However, the job will be charged for 30 minutes of resource use, which includes
	the boot time.</p>

	<h3 id="use_case">Sample Use Cases
	<a class="slurm_link" href="#use_case"></a>
	</h3>

	<pre>
	$ sbatch -C flat,a2a -N2 --gres=hbm:8g --exclusive my.script
	$ srun --constraint=hemi,cache -n36 a.out
	$ srun --constraint=flat --gres=hbm:2g -n36 a.out

	$ sinfo -o "%30N %20b %f"
	NODELIST ACTIVE_FEATURES AVAIL_FEATURES
	nid000[10-11]
	nid000[12-35] flat,a2a flat,a2a,snc2,hemi
	nid000[36-43] cache,a2a flat,equal,cache,a2a,hemi
	</pre>

	<h3 id="topology">Network Topology
	<a class="slurm_link" href="#topology"></a>
	</h3>

	<p>Slurm will optimize performance using those resources available without
	rebooting. If node rebooting is required, then it will optimize layout with
	respect to network bandwidth using both nodes currently in the desired
	configuration and those which can be made available after rebooting.
	This can result in more nodes being rebooted than strictly needed, but will
	improve application performance.</p>

	<p>Users can specify they want all resources allocated on a specific count of
	leaf switches (Dragonfly group) using Slurm's <b>--switches</b> option.
	They can also specify how much additional time they are willing to wait for
	such a configuration. If the desired configuration can not be made available
	within the specified time interval, the job will be allocated nodes optimized
	with respect to network bandwidth to the extent possible. On a Dragonfly
	network, this means allocating resources over either single group or
	distributed evenly over as many groups as possible. For example:</p>
	<pre>
	srun --switches=1@10:00 N16 a.out
	</pre>
	<p>Note that system administrators can disable use of the <b>--switches</b>
	option or limit the amount of time the job can be deferred using the
	<b>SchedulerParameters</b> <b>max-switch-wait</b> option.</p>

	<h3 id="boot_problems">Booting Problems
	<a class="slurm_link" href="#boot_problems"></a>
	</h3>

	<p>If node boots fail, those nodes are drained and the job is requeued so that
	it can be allocated a different set of nodes. The nodes originally allocated
	to the job will remain available to the job, so likely a small number of
	additional nodes will be required.</p>

	<h2 id="administration">System Administration
	<a class="slurm_link" href="#administration"></a>
	</h2>

	<p>Four important components are required to use Slurm on an Intel KNL system.</p>
	<ol>
	<li>Slurm needs a mechanism to determine the node's current topology (e.g.
	how many NUMA exist and which cores are associated with each NUMA). Slurm
	relies upon <a href="http://www.open-mpi.org/projects/hwloc/">
	Portable Hardware Locality (HWLOC)</a> for this functionality. Please install
	HWLOC before building Slurm.</li>

	<li>The node features plugin manages the available and active features
	information available for each KNL node.</li>

	<li>A configuration file is used to define various timeouts, default
	configuration, etc. The configuration file name and contents will depend upon
	the node features plugins used. See the <a href="knl.conf.html">knl.conf</a>
	man page for more information.</li>

	<li>A mechanism is required to boot nodes in the desired configuration. This
	mechanism must be integrated with existing Slurm infrastructure for
	<a href="sbatch.html">rebooting nodes on user request (--reboot)</a>.</li>
	</ol>

	<p>In addition, there is a DebugFlags option of "NodeFeatures" which will
	generate detailed information about KNL operations.</p>

	<p>The KNL-specific available and active features are configured differently
	based upon the plugin configured.<br>
	<u>For the knl_generic plugin</u>, KNL-specific features should be defined
	in the "slurm.conf" configuration file. When the slurmd daemon starts on each
	compute node, it will update the available and active features as needed.<br>
	Features which are not KNL-specific (e.g. rack number, "knl", etc.) will be
	copied from the node's "Features" configuration in "slurm.conf" to both the
	available and active feature fields and not modified by the NodeFeatures
	plugin.</p>

	<p><b>NOTE</b>: For Dell KNL systems you must also include the
	<i>SystemType=Dell</i> option for successful operation and will likely need to
	increase the <i>SyscfgTimeout</i> to allow enough time for the command to
	successfully complete. Experience at one site has shown that a 10 second
	timeout may be necessary, configured as <i>SyscfgTimeout=10000</i>.</p>

	<p>Slurm does not support the concept of multiple NUMA nodes
	within a single socket. If a KNL node is booted with multiple NUMA, then each
	NUMA node will appear in Slurm as a separate socket.
	In the slurm.conf configuration file, set node socket and
	core counts to values which are appropriate for some NUMA mode to be used on the
	node. When the node boots and the slurmd daemon on the node starts, it will
	report to the slurmctld daemon the node's actual socket (NUMA) and core counts,
	which will update Slurm data structures for the node to the values which are
	currently configured.
	Note that Slurm currently does not support the concept of
	differing numbers of cores in each socket (or NUMA node). We are currently
	working to address these issues.</p>

	<h3 id="operation">Mode of Operation
	<a class="slurm_link" href="#operation"></a>
	</h3>

	<ol>
	<li>The node's configured "Features" are copied to the available and active
	feature fields.</li>
	<li>The node features plugin determines the node's current MCDRAM and NUMA
	values as well as those which are available and adds those values to the node's
	active and available feature fields respectively. Note that these values may
	not be available until the node has booted and the slurmd daemon on the
	compute node sends that information to the slurmctld daemon.</li>
	<li>Jobs will be allocated nodes already in the requested MCDRAM and NUMA mode
	if possible. If insufficient resources are available with the requested
	configuration then other nodes will be selected and booted into the desired
	configuration once no other jobs are active on the node. Until a node is idle,
	its configuration can not be changed. Note that node reboot time is roughly
	on the order of 20 minutes.</li>
	</ol>

	<h3 id="config">Generic Cluster Configuration
	<a class="slurm_link" href="#config"></a>
	</h3>

	<p>All other clusters should have NodeFeaturesPlugins configured to "knl_generic".
	This plugin performs all operations directly on the compute nodes using Intel's
	<i>syscfg</i> program to get and modify the node's MCDRAM and NUMA mode and
	uses the Linux <i>reboot</i> program to reboot the compute node in order for
	modifications in MCDRAM and/or NUMA mode to take effect.
	Make sure that RebootProgram is defined in the slurm.conf file.
	This plugin currently does <u>not</u> permit the specification of ResumeProgram,
	SuspendProgram, SuspendTime, etc. in slurm.conf, however that limitation may
	be removed in the future (the ResumeProgram currently has no means of changing
	the node's MCDRAM and/or NUMA mode prior to reboot).</p>

	<p><b>NOTE</b>: The syscfg program reports the MCDRAM and NUMA mode to be used
	when the node is next booted. If the syscfg program is used to modify the MCDRAM
	or NUMA mode of a node, but it is not rebooted, then Slurm will be making
	scheduling decisions based upon incorrect state information. If you want to
	change node state information outside of Slurm then use the following procedure:
	<ol>
	<li>Drain the nodes of interest</li>
	<li>Change their MCDRAM and/or NUMA mode</li>
	<li>Reboot the nodes, then</li>
	<li>Restore them to service in Slurm</li>
	</ol>
	</p>

	<h4>Sample knl_generic.conf File</h4>

	<pre>
	# Sample knl_generic.conf
	SyscfgPath=/usr/bin/syscfg
	DefaultNUMA=a2a # NUMA=all2all
	AllowNUMA=a2a,snc2,hemi
	DefaultMCDRAM=cache # MCDRAM=cache
	</pre>

	<h4>Sample slurm.conf File</h4>

	<pre>
	# Sample slurm.conf
	NodeFeaturesPlugins=knl_generic
	DebugFlags=NodeFeatures
	GresTypes=hbm
	RebootProgram=/sbin/reboot
	...
	Nodename=default Sockets=1 CoresPerSocket=68 ThreadsPerCore=4 RealMemory=128000 Feature=knl
	NodeName=nid[00000-00127] State=UNKNOWN
	</pre>


	<p style="text-align:center;">Last modified 13 March 2024</p>

	<!--#include virtual="footer.txt"-->