blob: 8e6fe7f6e1b2cc3eff8bd053153efc883696be52 [file] [log] [blame]
<!--#include virtual="header.txt"-->
<h1>"Configless" Slurm</h1>
<p>"Configless" Slurm is a feature that allows the compute nodes &mdash;
specifically the slurmd process &mdash; and user commands running on login
nodes to pull configuration information directly from the slurmctld instead of
from a pre-distributed local file. Your cluster does require a central set of
configuration files on the Slurm controllers &mdash; "configless" in Slurm's
parlance means that the compute nodes, login nodes, and other cluster hosts
do not need to be deployed with local copies of these files.</p>
<p>The slurmd on startup will reach out to a slurmctld that you specify and the
config files will be pulled to the node. This slurmctld can be identified by
either an explicit option, or &mdash; preferably &mdash; through DNS SRV
records defined within the cluster itself.</p>
<p>If you have a <a href="quickstart_admin.html#login">login node</a> you
will be running client commands
from, those client commands will have to use the DNS record to get the
configuration information from the controller when they run.
If you expect to have a lot of traffic from a login node, this
can generate a lot of requests for the configuration files. In cases like
this, <a href="sackd.html">sackd</a> can be used to manage configuration files
for the node reducing network requests.</p>
<h2 id="INSTALLATION">Installation
<a class="slurm_link" href="#INSTALLATION"></a>
</h2>
<p>There are no extra steps required to install this feature. It is built in
by default starting with Slurm 20.02.</p>
<h2 id="SETUP">Setup<a class="slurm_link" href="#SETUP"></a></h2>
<p>The slurmctld must first be configured to run in the configless mode.
This is handled by setting <b>SlurmctldParameters=enable_configless</b> in
slurm.conf and restarting slurmctld.</p>
<p>Once enabled, you must configure the slurmd to get its configs from the
slurmctld. This can be accomplished either by launching slurmd with the
<b>--conf-server</b> option, or by setting a DNS SRV record and ensuring there
is no local configuration file on the compute node.</p>
<p>The <b>--conf-server</b> options takes precedence over the DNS record.</p>
<p>The command line option takes "$host[:$port]", so an example would look like:
</p>
<pre>
slurmd --conf-server slurmctl-primary:6817
</pre>
<p>
Specifying the port is optional and will default to 6817 if it is not present.
Multiple slurmctlds can be specified as a comma-separated list, in priority
order (highest to lowest).
</p>
<pre>
slurmd --conf-server slurmctl-primary:6817,slurmctl-secondary
</pre>
<p>The same information can be provided in a DNS SRV record. For example:</p>
<pre>
_slurmctld._tcp 3600 IN SRV 10 0 6817 slurmctl-backup
_slurmctld._tcp 3600 IN SRV 0 0 6817 slurmctl-primary
</pre>
<p>
Will provide the required information to the slurmd on startup. As shown above,
multiple SRV records can be specified if you have deployed Slurm in an HA
setup. The DNS SRV entry with the lowest priority should be your primary
slurmctld, with higher priority values for backup slurmctlds.
</p>
<h2 id="INITIAL_TESTING">Initial Testing
<a class="slurm_link" href="#INITIAL_TESTING"></a>
</h2>
<p>With the slurmctld configured and slurmd started, you can check in a couple
places to make sure the configs are present on the node. Config files will be
in <b>SlurmdSpoolDir</b> under the <b>/conf-cache/</b>, and a symlink to this
location will be created automatically in <b>/run/slurm/conf</b>. You can
confirm that reloading is working by adding a comment to your slurm.conf on the
slurmctld node and running
<span class="commandline">scontrol reconfig</span> and checking that the config
was updated.</p>
<h2 id="LIMITATIONS">Limitations
<a class="slurm_link" href="#LIMITATIONS"></a>
</h2>
<p>Using "%n" in "SlurmdSpoolDir" or "SlurmdPidFile" will not be properly
substituted for the NodeName unless slurmd is also launched with the "-N"
option.</p>
<p>If you are using systemd to launch slurmd, you must ensure that
"ConditionPathExists=*" is not present in the unit file or the slurmd will not
start. (The example slurmd.service file shipped in Slurm 20.02 and above does
not include this entry.)</p>
<p>If any of the supported config files "Include" additional config files,
the Included configs will <b>ONLY</b> be shipped if their "Include" filename
reference has no path separators and the file is located adjacent to slurm.conf.
Any additional config files will need to be shared a different way or added to
the parent config.
</p>
<p>If <b>Prolog</b> and <b>Epilog</b> scripts are specified in slurm.conf,
the scripts will <b>ONLY</b> be shipped if the filenames referenced have no path
separators and the file is located adjacent to slurm.conf.
</p>
<h2 id="NOTES">Notes<a class="slurm_link" href="#NOTES"></a></h2>
<p>The order of precedence for determining what configuration source to use
is as follows:</p>
<ol>
<li>The slurmd --conf-server $host[:$port] option</li>
<li>The -f $config_file option</li>
<li>The SLURM_CONF environment variable (if set)</li>
<li>A local slurm config file:
<ol>
<li>The default slurm config file (likely /etc/slurm.conf)</li>
<li>For user commands, a cached slurm config file
(run/slurm/conf/slurm.conf)</li>
</ol>
</li>
<li>The SLURM_CONF_SERVER environment variable (if set)</li>
<li>Any DNS SRV records (from lowest priority value to highest)</li>
<ul>
<li>The TTL (Time To Live) of the SRV record does not affect the validity
of the obtained configuration. The nodes will have to be notified of any
changes to the configuration file through an
<span class=commandline>scontrol reconfig</span> or a slurmd restart.</li>
</ul>
</ol>
<p>Supported configuration files are:</p>
<ul>
<li>slurm.conf</li>
<li>acct_gather.conf</li>
<li>cgroup.conf</li>
<li>cli_filter.lua</li>
<li>gres.conf</li>
<li>helpers.conf</li>
<li>job_container.conf</li>
<li>mpi.conf</li>
<li>oci.conf</li>
<li>plugstack.conf</li>
<li>scrun.lua</li>
<li>topology.conf</li>
<li>topology.yaml</li>
</ul>
<p style="text-align:center;">Last modified 01 August 2025</p>
<!--#include virtual="footer.txt"-->