doc/html/programmer_guide.shtml - SchedMD/slurm - Git at Google

 <!--#include virtual="header.txt"-->

 <h1><a name="top">Slurm Programmer's Guide</a></h1>

 <h2 id="contents">Contents<a class="slurm_link" href="#contents"></a></h2>
 <ul>
 <li><a href="#Overview">Overview</a></li>
 <li><a href="#Plugins">Plugins</a></li>
 <li><a href="#Directory-Structure">Directory Structure</a></li>
 <li><a href="#Documentation">Documentation</a></li>
 <li><a href="#Source-Code">Source Code</a></li>
 <li><a href="#Source-Code-Management">Source Code Management</a></li>
 <li><a href="#Adding-New-Modules">Adding New Modules</a></li>
 <li><a href="#Compiling">Compiling</a></li>
 <li><a href="#Configuration">Configuration</a></li>
 <li><a href="#Test-Suite">Test Suite</a></li>
 <li><a href="#Adding-Files-and-Directories">Adding Files and Directories</a></li>
 <li><a href="#Tricks-of-the-Trade">Tricks of the Trade</a></li>
 </ul>

 <h2 id="Overview">Overview<a class="slurm_link" href="#Overview"></a></h2>

 <p>Slurm is an open source, fault-tolerant,
 and highly scalable cluster management and job scheduling system for large and
 small Linux clusters. Components include machine status, partition management,
 job management, scheduling, and stream copy modules. Slurm requires no kernel
 modifications for it operation and is relatively self-contained.

 <p>Slurm is written in the C language and uses a GNU <b>autoconf</b> configuration
 engine. While initially written for Linux, other UNIX-like operating systems should
 be easy porting targets. Code should adhere to the <a href="coding_style.pdf">
 Linux kernel coding style</a>. <i>(Some components of Slurm have been taken from
 various sources. Some of these components do not conform to the Linux kernel
 coding style. However, new code written for Slurm should follow these standards.)</i>

 <h2 id="Plugins">Plugins<a class="slurm_link" href="#Plugins"></a></h2>

 <p>To make the use of different infrastructures possible, Slurm uses a general
 purpose plugin mechanism. A Slurm plugin is a dynamically linked code object that
 is loaded explicitly at run time by the Slurm libraries. It provides a customized
 implementation of a well-defined API connected to tasks such as authentication,
 interconnect fabric, task scheduling, etc. A set of functions is defined for use
 by all of the different infrastructures of a particular variety. When a Slurm
 daemon is initiated, it reads the configuration file to determine which of the
 available plugins should be used. A <a href="plugins.html">plugin developer's
 guide</a> is available with general information about plugins.


 <h2 id="Directory-Structure">Directory Structure
 <a class="slurm_link" href="#Directory-Structure"></a>
 </h2>

 <p>The contents of the Slurm directory structure will be described below in increasing
 detail as the structure is descended. The top level directory contains the scripts
 and tools required to build the entire Slurm system. It also contains a variety
 of subdirectories for each type of file.</p>
 <p>General build tools/files include: <b>acinclude.m4</b>,
 <b>configure.ac</b>, <b>Makefile.am</b>, <b>Make-rpm.mk</b>, <b>META</b>, <b>README</b>,
 <b>slurm.spec.in</b>, and the contents of the <b>auxdir</b> directory. <span class="commandline">autoconf</span>
 and <span class="commandline">make</span> commands are used to build and install
 Slurm in an automated fashion.
 <b>NOTE</b>: <span class="commandline">autoconf</span>
 version 2.52 or higher is required to build Slurm. Execute
 <span class="commandline">autoconf -V</span> to check your version number.
 The build process is described in the README file.

 <p>Copyright and disclaimer information are in the files COPYING and DISCLAIMER.
 All of the top-level subdirectories are described below.</p>

 <p style="margin-left:.2in"><b>auxdir</b> &mdash; Used for building Slurm.<br>
 <b>contribs</b> &mdash; Various contributed tools.<br>
 <b>doc</b> &mdash; Documentation including man pages. <br>
 <b>etc</b> &mdash; Sample configuration files.<br>
 <b>slurm</b> &mdash; Header files for API use. These files must be installed. Placing
 these header files in this location makes for better code portability.<br>
 <b>src</b> &mdash; Contains all source code and header files not in the "slurm" subdirectory
 described above.<br>
 <b>testsuite</b> &mdash; Check, Expect and Pytest tests are here.</p>


 <h2 id="Documentation">Documentation
 <a class="slurm_link" href="#Documentation"></a>
 </h2>
 <p>All of the documentation is in the subdirectory <b>doc</b>.
 Two directories are of particular interest:</p>

 <p style="margin-left:.2in">
 <b>doc/man</b> &mdash; contains the man pages for the APIs,
 configuration file, commands, and daemons.<br>
 <b>doc/html</b> &mdash; contains the web pages.</p>

 <h2 id="Source-Code">Source Code
 <a class="slurm_link" href="#Source-Code"></a>
 </h2>

 <p>Functions are divided into several categories, each in its own subdirectory.
 The details of each directory's contents are provided below. The directories are
 as follows: </p>

 <p style="margin-left:.2in">
 <b>api</b> &mdash; Application Program Interfaces into
 the Slurm code. Used to send and get Slurm information from the central manager.
 These are the functions user applications might utilize.<br>
 <b>common</b> &mdash; General purpose functions for widespread use throughout
 Slurm.<br>
 <b>database</b> &mdash; Various database files that support the accounting
  storage plugin.<br>
 <b>plugins</b> &mdash; Plugin functions for various infrastructures or optional
 behavior. A separate subdirectory is used for each plugin class:<br>
 <ul>
 <li><b>accounting_storage</b> for specifying the type of storage for accounting,<br>
 <li><b>auth</b> for user authentication,<br>
 <li><b>cred</b> for job credential functions,<br>
 <li><b>jobacct_gather</b> for job accounting,<br>
 <li><b>jobcomp</b> for job completion logging,<br>
 <li><b>mpi</b> for MPI support,<br>
 <li><b>priority</b> calculates job priority based on a number of factors
 including fair-share,<br>
 <li><b>proctrack</b> for process tracking,<br>
 <li><b>sched</b> for job scheduler,<br>
 <li><b>select</b> for a job's node selection,<br>
 <li><b>switch</b> for switch (interconnect) specific functions,<br>
 <li><b>task</b> for task affinity to processors,<br>
 <li><b>topology</b> methods for assigning nodes to jobs based on node
 topology.<br>
 </ul>
 <p style="margin-left:.2in">
 <b>sacct</b> &mdash; User command to view accounting information about jobs.<br>
 <b>sacctmgr</b> &mdash; User and administrator tool to manage accounting.<br>
 <b>salloc</b> &mdash; User command to allocate resources for a job.<br>
 <b>sattach</b> &mdash; User command to attach standard input, output and error
 files to a running job or job step.<br>
 <b>sbatch</b> &mdash; User command to submit a batch job (script for later execution).<br>
 <b>sbcast</b> &mdash; User command to broadcast a file to all nodes associated
 with an existing Slurm job.<br>
 <b>scancel</b> &mdash; User command to cancel (or signal) a job or job step.<br>
 <b>scontrol</b> &mdash; Administrator tool to manage Slurm.<br>
 <b>sinfo</b> &mdash; User command to get information on Slurm nodes and partitions.<br>
 <b>slurmctld</b> &mdash; Slurm central manager daemon code.<br>
 <b>slurmd</b> &mdash; Slurm daemon code to manage the compute server nodes including
 the execution of user applications.<br>
 <b>slurmdbd</b> &mdash; Slurm database daemon managing access to the accounting
 storage database.<br>
 <b>sprio</b> &mdash; User command to see the breakdown of a job's priority
 calculation when the Multifactor Job Priority plugin is installed.<br>
 <b>squeue</b> &mdash; User command to get information on Slurm jobs and job steps.<br>
 <b>sreport</b> &mdash; User command to view various reports about past
 usage across the enterprise.<br>
 <b>srun</b> &mdash; User command to submit a job, get an allocation, and/or
 initiation a parallel job step.<br>
 <b>sshare</b> &mdash; User command to view shares and usage when the Multifactor
 Job Priority plugin is installed.<br>
 <b>sstat</b> &mdash; User command to view detailed statistics about running
 jobs when a Job Accounting Gather plugin is installed.<br>
 <b>strigger</b> &mdash; User and administrator tool to manage event triggers.<br>
 <b>sview</b> &mdash; User command to view and update node, partition, and
 job state information.<br>


 <h2 id="Source-Code-Management">Source Code Management
 <a class="slurm_link" href="#Source-Code-Management"></a>
 </h2>
 <p>The latest code is in github:
 <a href="https://github.com/SchedMD/slurm">https://github.com/SchedMD/slurm</a>.
 Creating your own branch will make it easier to keep it synchronized
 with our work.</p>

 <h2 id="Adding-New-Modules">Adding New Modules
 <a class="slurm_link" href="#Adding-New-Modules"></a>
 </h2>
 <p>Add the new file name to the Makefile.am file in the appropriate directory.
 Then execute autoreconf (at the top level of the Slurm source directory).
 Note that a relatively current version of automake is required.
 The autoreconf program will build Makefile.in files from the Makefile.am files.
 If any new files need to be installed, update the slurm.spec file to identify
 the RPM in which the new files should be placed</p>

 <p>If new directories need to be added, add to the configure.ac file the path
 to the Makefile to be built in the new directory. In summary:<br>
 <u>autoreconf</u> translates <u>.am</u> files into <u>.in</u> files<br>
 <u>configure</u> translates <u>.in</u> files, adding paths and version numbers.</p>

 <h2 id="Compiling">Compiling<a class="slurm_link" href="#Compiling"></a></h2>
 <p>Sending the standard output of "make" to a file makes it easier to see any
 warning or error messages:<br>
 <i>"make -j install &gt;make.out"</i></p>

 <h2 id="Configuration">Configuration
 <a class="slurm_link" href="#Configuration"></a>
 </h2>
 <p>Sample configuration files are included in the <b>etc</b> subdirectory.
 The <b>slurm.conf</b> can be built using a <a href="configurator.html">configuration tool</a>.
 See <b>doc/man/man5/slurm.conf.5</b> and the man pages for other configuration files
 for more details.
 <b>init.d.slurm</b> is a script that determines which
 Slurm daemon(s) should execute on any node based upon the configuration file contents.
 It will also manage these daemons: starting, signaling, restarting, and stopping them.</p>

 <h2 id="Test-Suite">Test Suite<a class="slurm_link" href="#Test-Suite"></a></h2>
 <p>The <b>testsuite</b> files use Check, Expect and Pytest for testing Slurm in
 different ways.</p>

 <p>The Check tests are designed to unit test C code. Only with
 <i>make check</i>, without Slurm installed, they will validate that key C
 functions work correctly.</p>

 <p>We also have a set of Expect Slurm tests available under the <b>testsuite/expect</b>
 directory.  These tests are executed after Slurm has been installed
 and the daemons initiated. These tests exercise all Slurm commands
 and options including stress tests.  The file <b>testsuite/expect/globals</b>
 contains the Expect test framework for all of the individual tests.  At
 the very least, you will need to set the <i>slurm_dir</i> variable to the correct
 value.  To avoid conflicts with other developers, you can override variable settings
 in a separate file named <b>testsuite/expect/globals.local</b>.</p>

 <p>Set your working directory to <b>testsuite/expect</b> before
 starting these tests.  Tests may be executed individually by name
 (e.g.  <i>test1.1</i>)
 or the full test suite may be executed with the single command
 <i>regression.py</i>.
 See <b>testsuite/expect/README</b> for more information.</p>

 <p>Slurm also has a Pytest environment that can work like the Expect one, but it
 also works together with an external QA framework to improve the overall QA
 of Slurm.</p>

 <h2 id="Adding-Files-and-Directories">Adding Files and Directories
 <a class="slurm_link" href="#Adding-Files-and-Directories"></a>
 </h2>
 <p>If you are adding files and directories to Slurm, it will be necessary to
 re-build configuration files before executing the <b>configure</b> command.
 Update <b>Makefile.am</b> files as needed then execute
 <b>autoreconf</b> before executing <b>configure</b>.

 <h2 id="Tricks-of-the-Trade">Tricks of the Trade
 <a class="slurm_link" href="#Tricks-of-the-Trade"></a>
 </h2>
 <h3 id="multiple_slurmd_support">Multiple slurmd support
 <a class="slurm_link" href="#multiple_slurmd_support"></a>
 </h3>
 <p>It is possible to run multiple slurmd daemons on a single node, each using
 a different port number and NodeName alias.  This is very useful for testing
 networking and protocol changes, or anytime you want to simulate a larger
 cluster than you really have.  The author uses this on his desktop to simulate
 multiple nodes.  However, it is important to note that not all slurm functions
 will work with multiple slurmd support enabled (e.g. many switch plugins will
 not work, it is best not to use any).</p>

 <p>Multiple support is enabled at configure-time with the
 "--enable-multiple-slurmd" parameter.  This enables a new parameter in the
 slurm.conf file on the NodeName line, "Port=<port number>", and adds a new
 command line parameter to slurmd, "-N".</p>

 <p>Each slurmd needs to have its own NodeName, and its own TCP port number. Here
 is an example of the NodeName lines for running three slurmd daemons on each
 of ten nodes:</p>

 <pre>
 NodeName=foo[1-10] NodeHostname=host[1-10]  Port=17001
 NodeName=foo[11-20] NodeHostname=host[1-10] Port=17002
 NodeName=foo[21-30] NodeHostname=host[1-10] Port=17003
 </pre>

 <p>
 It is likely that you will also want to use the "%n" symbol in any slurmd
 related paths in the slurm.conf file, for instance SlurmdLogFile,
 SlurmdPidFile, and especially SlurmdSpoolDir.  Each slurmd replaces the "%n"
 with its own NodeName.  Here is an example:</p>

 <pre>
 SlurmdLogFile=/var/log/slurm/slurmd.%n.log
 SlurmdPidFile=/var/run/slurmd.%n.pid
 SlurmdSpoolDir=/var/spool/slurmd.%n
 </pre>

 <p>
 You can manually start each slurmd daemon with the proper NodeName.
 For example, to start the slurmd daemons for host1 from the
 above slurm.conf example:</p>

 <pre>
 host1> slurmd -N foo1
 host1> slurmd -N foo11
 host1> slurmd -N foo21
 </pre>

 <p>
 If you have SysV init scripts, slurmd daemons will automatically be started by
 whenever MULTIPLE_SLURMD is set to yes in /etc/sysconfig/slurm.
 If your distribution uses systemd, you may want to use templating feature to
 define one slurmd.service file and registering each of your virtual nodes
 within it, for example:</p>

 <pre>
 [Unit]
 Description=Slurm node daemon
 After=network.target munge.service remote-fs.target
 ConditionPathExists=/etc/slurm.conf

 [Service]
 Type=forking
 EnvironmentFile=-/etc/sysconfig/slurmd
 ExecStart=/usr/sbin/slurmd -N%i $SLURMD_OPTIONS
 ExecReload=/bin/kill -HUP $MAINPID
 PIDFile=/var/run/slurmd-%i.pid
 KillMode=process
 LimitNOFILE=51200
 LimitMEMLOCK=infinity
 LimitSTACK=infinity
 Delegate=yes

 [Install]
 WantedBy=multi-user.target
 </pre>

 <p>Then, enabling/managing a service like this (what is <i>%i</i> in the file
 will be replaced by what is after the <i>@</i> in the command line):</p>

 <pre>
 systemctl enable slurmd@nodeXYZ
 systemctl start/stop/restart slurmd@nodeXYZ
 </pre>

 </p>

 <p style="text-align:center;">Last modified 21 May 2025</p>

 <!--#include virtual="footer.txt"-->
	<!--#include virtual="header.txt"-->

	<h1><a name="top">Slurm Programmer's Guide</a></h1>

	<h2 id="contents">Contents<a class="slurm_link" href="#contents"></a></h2>
	<ul>
	<li><a href="#Overview">Overview</a></li>
	<li><a href="#Plugins">Plugins</a></li>
	<li><a href="#Directory-Structure">Directory Structure</a></li>
	<li><a href="#Documentation">Documentation</a></li>
	<li><a href="#Source-Code">Source Code</a></li>
	<li><a href="#Source-Code-Management">Source Code Management</a></li>
	<li><a href="#Adding-New-Modules">Adding New Modules</a></li>
	<li><a href="#Compiling">Compiling</a></li>
	<li><a href="#Configuration">Configuration</a></li>
	<li><a href="#Test-Suite">Test Suite</a></li>
	<li><a href="#Adding-Files-and-Directories">Adding Files and Directories</a></li>
	<li><a href="#Tricks-of-the-Trade">Tricks of the Trade</a></li>
	</ul>

	<h2 id="Overview">Overview<a class="slurm_link" href="#Overview"></a></h2>

	<p>Slurm is an open source, fault-tolerant,
	and highly scalable cluster management and job scheduling system for large and
	small Linux clusters. Components include machine status, partition management,
	job management, scheduling, and stream copy modules. Slurm requires no kernel
	modifications for it operation and is relatively self-contained.

	<p>Slurm is written in the C language and uses a GNU <b>autoconf</b> configuration
	engine. While initially written for Linux, other UNIX-like operating systems should
	be easy porting targets. Code should adhere to the <a href="coding_style.pdf">
	Linux kernel coding style</a>. <i>(Some components of Slurm have been taken from
	various sources. Some of these components do not conform to the Linux kernel
	coding style. However, new code written for Slurm should follow these standards.)</i>

	<h2 id="Plugins">Plugins<a class="slurm_link" href="#Plugins"></a></h2>

	<p>To make the use of different infrastructures possible, Slurm uses a general
	purpose plugin mechanism. A Slurm plugin is a dynamically linked code object that
	is loaded explicitly at run time by the Slurm libraries. It provides a customized
	implementation of a well-defined API connected to tasks such as authentication,
	interconnect fabric, task scheduling, etc. A set of functions is defined for use
	by all of the different infrastructures of a particular variety. When a Slurm
	daemon is initiated, it reads the configuration file to determine which of the
	available plugins should be used. A <a href="plugins.html">plugin developer's
	guide</a> is available with general information about plugins.


	<h2 id="Directory-Structure">Directory Structure
	<a class="slurm_link" href="#Directory-Structure"></a>
	</h2>

	<p>The contents of the Slurm directory structure will be described below in increasing
	detail as the structure is descended. The top level directory contains the scripts
	and tools required to build the entire Slurm system. It also contains a variety
	of subdirectories for each type of file.</p>
	<p>General build tools/files include: <b>acinclude.m4</b>,
	<b>configure.ac</b>, <b>Makefile.am</b>, <b>Make-rpm.mk</b>, <b>META</b>, <b>README</b>,
	<b>slurm.spec.in</b>, and the contents of the <b>auxdir</b> directory. <span class="commandline">autoconf</span>
	and <span class="commandline">make</span> commands are used to build and install
	Slurm in an automated fashion.
	<b>NOTE</b>: <span class="commandline">autoconf</span>
	version 2.52 or higher is required to build Slurm. Execute
	<span class="commandline">autoconf -V</span> to check your version number.
	The build process is described in the README file.

	<p>Copyright and disclaimer information are in the files COPYING and DISCLAIMER.
	All of the top-level subdirectories are described below.</p>

	<p style="margin-left:.2in"><b>auxdir</b> — Used for building Slurm.<br>
	<b>contribs</b> — Various contributed tools.<br>
	<b>doc</b> — Documentation including man pages. <br>
	<b>etc</b> — Sample configuration files.<br>
	<b>slurm</b> — Header files for API use. These files must be installed. Placing
	these header files in this location makes for better code portability.<br>
	<b>src</b> — Contains all source code and header files not in the "slurm" subdirectory
	described above.<br>
	<b>testsuite</b> — Check, Expect and Pytest tests are here.</p>


	<h2 id="Documentation">Documentation
	<a class="slurm_link" href="#Documentation"></a>
	</h2>
	<p>All of the documentation is in the subdirectory <b>doc</b>.
	Two directories are of particular interest:</p>

	<p style="margin-left:.2in">
	<b>doc/man</b> — contains the man pages for the APIs,
	configuration file, commands, and daemons.<br>
	<b>doc/html</b> — contains the web pages.</p>

	<h2 id="Source-Code">Source Code
	<a class="slurm_link" href="#Source-Code"></a>
	</h2>

	<p>Functions are divided into several categories, each in its own subdirectory.
	The details of each directory's contents are provided below. The directories are
	as follows: </p>

	<p style="margin-left:.2in">
	<b>api</b> — Application Program Interfaces into
	the Slurm code. Used to send and get Slurm information from the central manager.
	These are the functions user applications might utilize.<br>
	<b>common</b> — General purpose functions for widespread use throughout
	Slurm.<br>
	<b>database</b> — Various database files that support the accounting
	storage plugin.<br>
	<b>plugins</b> — Plugin functions for various infrastructures or optional
	behavior. A separate subdirectory is used for each plugin class:<br>
	<ul>
	<li><b>accounting_storage</b> for specifying the type of storage for accounting,<br>
	<li><b>auth</b> for user authentication,<br>
	<li><b>cred</b> for job credential functions,<br>
	<li><b>jobacct_gather</b> for job accounting,<br>
	<li><b>jobcomp</b> for job completion logging,<br>
	<li><b>mpi</b> for MPI support,<br>
	<li><b>priority</b> calculates job priority based on a number of factors
	including fair-share,<br>
	<li><b>proctrack</b> for process tracking,<br>
	<li><b>sched</b> for job scheduler,<br>
	<li><b>select</b> for a job's node selection,<br>
	<li><b>switch</b> for switch (interconnect) specific functions,<br>
	<li><b>task</b> for task affinity to processors,<br>
	<li><b>topology</b> methods for assigning nodes to jobs based on node
	topology.<br>
	</ul>
	<p style="margin-left:.2in">
	<b>sacct</b> — User command to view accounting information about jobs.<br>
	<b>sacctmgr</b> — User and administrator tool to manage accounting.<br>
	<b>salloc</b> — User command to allocate resources for a job.<br>
	<b>sattach</b> — User command to attach standard input, output and error
	files to a running job or job step.<br>
	<b>sbatch</b> — User command to submit a batch job (script for later execution).<br>
	<b>sbcast</b> — User command to broadcast a file to all nodes associated
	with an existing Slurm job.<br>
	<b>scancel</b> — User command to cancel (or signal) a job or job step.<br>
	<b>scontrol</b> — Administrator tool to manage Slurm.<br>
	<b>sinfo</b> — User command to get information on Slurm nodes and partitions.<br>
	<b>slurmctld</b> — Slurm central manager daemon code.<br>
	<b>slurmd</b> — Slurm daemon code to manage the compute server nodes including
	the execution of user applications.<br>
	<b>slurmdbd</b> — Slurm database daemon managing access to the accounting
	storage database.<br>
	<b>sprio</b> — User command to see the breakdown of a job's priority
	calculation when the Multifactor Job Priority plugin is installed.<br>
	<b>squeue</b> — User command to get information on Slurm jobs and job steps.<br>
	<b>sreport</b> — User command to view various reports about past
	usage across the enterprise.<br>
	<b>srun</b> — User command to submit a job, get an allocation, and/or
	initiation a parallel job step.<br>
	<b>sshare</b> — User command to view shares and usage when the Multifactor
	Job Priority plugin is installed.<br>
	<b>sstat</b> — User command to view detailed statistics about running
	jobs when a Job Accounting Gather plugin is installed.<br>
	<b>strigger</b> — User and administrator tool to manage event triggers.<br>
	<b>sview</b> — User command to view and update node, partition, and
	job state information.<br>


	<h2 id="Source-Code-Management">Source Code Management
	<a class="slurm_link" href="#Source-Code-Management"></a>
	</h2>
	<p>The latest code is in github:
	<a href="https://github.com/SchedMD/slurm">https://github.com/SchedMD/slurm</a>.
	Creating your own branch will make it easier to keep it synchronized
	with our work.</p>

	<h2 id="Adding-New-Modules">Adding New Modules
	<a class="slurm_link" href="#Adding-New-Modules"></a>
	</h2>
	<p>Add the new file name to the Makefile.am file in the appropriate directory.
	Then execute autoreconf (at the top level of the Slurm source directory).
	Note that a relatively current version of automake is required.
	The autoreconf program will build Makefile.in files from the Makefile.am files.
	If any new files need to be installed, update the slurm.spec file to identify
	the RPM in which the new files should be placed</p>

	<p>If new directories need to be added, add to the configure.ac file the path
	to the Makefile to be built in the new directory. In summary:<br>
	<u>autoreconf</u> translates <u>.am</u> files into <u>.in</u> files<br>
	<u>configure</u> translates <u>.in</u> files, adding paths and version numbers.</p>

	<h2 id="Compiling">Compiling<a class="slurm_link" href="#Compiling"></a></h2>
	<p>Sending the standard output of "make" to a file makes it easier to see any
	warning or error messages:<br>
	<i>"make -j install >make.out"</i></p>

	<h2 id="Configuration">Configuration
	<a class="slurm_link" href="#Configuration"></a>
	</h2>
	<p>Sample configuration files are included in the <b>etc</b> subdirectory.
	The <b>slurm.conf</b> can be built using a <a href="configurator.html">configuration tool</a>.
	See <b>doc/man/man5/slurm.conf.5</b> and the man pages for other configuration files
	for more details.
	<b>init.d.slurm</b> is a script that determines which
	Slurm daemon(s) should execute on any node based upon the configuration file contents.
	It will also manage these daemons: starting, signaling, restarting, and stopping them.</p>

	<h2 id="Test-Suite">Test Suite<a class="slurm_link" href="#Test-Suite"></a></h2>
	<p>The <b>testsuite</b> files use Check, Expect and Pytest for testing Slurm in
	different ways.</p>

	<p>The Check tests are designed to unit test C code. Only with
	<i>make check</i>, without Slurm installed, they will validate that key C
	functions work correctly.</p>

	<p>We also have a set of Expect Slurm tests available under the <b>testsuite/expect</b>
	directory. These tests are executed after Slurm has been installed
	and the daemons initiated. These tests exercise all Slurm commands
	and options including stress tests. The file <b>testsuite/expect/globals</b>
	contains the Expect test framework for all of the individual tests. At
	the very least, you will need to set the <i>slurm_dir</i> variable to the correct
	value. To avoid conflicts with other developers, you can override variable settings
	in a separate file named <b>testsuite/expect/globals.local</b>.</p>

	<p>Set your working directory to <b>testsuite/expect</b> before
	starting these tests. Tests may be executed individually by name
	(e.g. <i>test1.1</i>)
	or the full test suite may be executed with the single command
	<i>regression.py</i>.
	See <b>testsuite/expect/README</b> for more information.</p>

	<p>Slurm also has a Pytest environment that can work like the Expect one, but it
	also works together with an external QA framework to improve the overall QA
	of Slurm.</p>

	<h2 id="Adding-Files-and-Directories">Adding Files and Directories
	<a class="slurm_link" href="#Adding-Files-and-Directories"></a>
	</h2>
	<p>If you are adding files and directories to Slurm, it will be necessary to
	re-build configuration files before executing the <b>configure</b> command.
	Update <b>Makefile.am</b> files as needed then execute
	<b>autoreconf</b> before executing <b>configure</b>.

	<h2 id="Tricks-of-the-Trade">Tricks of the Trade
	<a class="slurm_link" href="#Tricks-of-the-Trade"></a>
	</h2>
	<h3 id="multiple_slurmd_support">Multiple slurmd support
	<a class="slurm_link" href="#multiple_slurmd_support"></a>
	</h3>
	<p>It is possible to run multiple slurmd daemons on a single node, each using
	a different port number and NodeName alias. This is very useful for testing
	networking and protocol changes, or anytime you want to simulate a larger
	cluster than you really have. The author uses this on his desktop to simulate
	multiple nodes. However, it is important to note that not all slurm functions
	will work with multiple slurmd support enabled (e.g. many switch plugins will
	not work, it is best not to use any).</p>

	<p>Multiple support is enabled at configure-time with the
	"--enable-multiple-slurmd" parameter. This enables a new parameter in the
	slurm.conf file on the NodeName line, "Port=<port number>", and adds a new
	command line parameter to slurmd, "-N".</p>

	<p>Each slurmd needs to have its own NodeName, and its own TCP port number. Here
	is an example of the NodeName lines for running three slurmd daemons on each
	of ten nodes:</p>

	<pre>
	NodeName=foo[1-10] NodeHostname=host[1-10] Port=17001
	NodeName=foo[11-20] NodeHostname=host[1-10] Port=17002
	NodeName=foo[21-30] NodeHostname=host[1-10] Port=17003
	</pre>

	<p>
	It is likely that you will also want to use the "%n" symbol in any slurmd
	related paths in the slurm.conf file, for instance SlurmdLogFile,
	SlurmdPidFile, and especially SlurmdSpoolDir. Each slurmd replaces the "%n"
	with its own NodeName. Here is an example:</p>

	<pre>
	SlurmdLogFile=/var/log/slurm/slurmd.%n.log
	SlurmdPidFile=/var/run/slurmd.%n.pid
	SlurmdSpoolDir=/var/spool/slurmd.%n
	</pre>

	<p>
	You can manually start each slurmd daemon with the proper NodeName.
	For example, to start the slurmd daemons for host1 from the
	above slurm.conf example:</p>

	<pre>
	host1> slurmd -N foo1
	host1> slurmd -N foo11
	host1> slurmd -N foo21
	</pre>

	<p>
	If you have SysV init scripts, slurmd daemons will automatically be started by
	whenever MULTIPLE_SLURMD is set to yes in /etc/sysconfig/slurm.
	If your distribution uses systemd, you may want to use templating feature to
	define one slurmd.service file and registering each of your virtual nodes
	within it, for example:</p>

	<pre>
	[Unit]
	Description=Slurm node daemon
	After=network.target munge.service remote-fs.target
	ConditionPathExists=/etc/slurm.conf

	[Service]
	Type=forking
	EnvironmentFile=-/etc/sysconfig/slurmd
	ExecStart=/usr/sbin/slurmd -N%i $SLURMD_OPTIONS
	ExecReload=/bin/kill -HUP $MAINPID
	PIDFile=/var/run/slurmd-%i.pid
	KillMode=process
	LimitNOFILE=51200
	LimitMEMLOCK=infinity
	LimitSTACK=infinity
	Delegate=yes

	[Install]
	WantedBy=multi-user.target
	</pre>

	<p>Then, enabling/managing a service like this (what is <i>%i</i> in the file
	will be replaced by what is after the <i>@</i> in the command line):</p>

	<pre>
	systemctl enable slurmd@nodeXYZ
	systemctl start/stop/restart slurmd@nodeXYZ
	</pre>

	</p>

	<p style="text-align:center;">Last modified 21 May 2025</p>

	<!--#include virtual="footer.txt"-->