blob: ba3d1e24670f2e17d9cb1c5241f5ca4ada482e5f [file] [log] [blame] [edit]
<!--#include virtual="header.txt"-->
<h1><a name="top">SLURM Programmer's Guide</a></h1>
<h2>Overview</h2>
<p>Simple Linux Utility for Resource Management (SLURM) is an open source, fault-tolerant,
and highly scalable cluster management and job scheduling system for large and
small Linux clusters. Components include machine status, partition management,
job management, scheduling, and stream copy modules. SLURM requires no kernel
modifications for it operation and is relatively self-contained.
<p>There is an overview of the components and their interactions available in
a separate document, <a href="slurm_design.pdf"> SLURM: Simple Linux Utility for
Resource Management</a> [PDF].
<p>SLURM is written in the C language and uses a GNU <b>autoconf</b> configuration
engine. While initially written for Linux, other UNIX-like operating systems should
be easy porting targets. Code should adhere to the <a href="coding_style.pdf">
Linux kernel coding style</a>. <i>(Some components of SLURM have been taken from
various sources. Some of these components do not conform to the Linux kernel
coding style. However, new code written for SLURM should follow these standards.)</i>
<p>Many of these modules have been built and tested on a variety of Unix computers
including Red Hat Linux, IBM's AIX, Sun's Solaris, and Compaq's Tru-64. The only
module at this time that is operating system dependent is <span class="commandline">src/slurmd/read_proc.c</span>.
We will be porting and testing on additional platforms in future releases.
<h2>Plugins</h2>
<p>To make the use of different infrastructures possible, SLURM uses a general
purpose plugin mechanism. A SLURM plugin is a dynamically linked code object that
is loaded explicitly at run time by the SLURM libraries. It provides a customized
implementation of a well-defined API connected to tasks such as authentication,
interconnect fabric, task scheduling, etc. A set of functions is defined for use
by all of the different infrastructures of a particular variety. When a SLURM
daemon is initiated, it reads the configuration file to determine which of the
available plugins should be used. A <a href="plugins.html">plugin developer's
guide</a> is available with general information about plugins. Most plugin
types also have their own documentation available, such as
<a href="authplugins.html">SLURM Authentication Plugin API</a> and
<a href="jobcompplugins.html">SLURM Job Completion Logging API</a>.</p>
<p class="footer"><a href="#top">top</a></p>
<h2>Directory Structure</h2>
<p>The contents of the SLURM directory structure will be described below in increasing
detail as the structure is descended. The top level directory contains the scripts
and tools required to build the entire SLURM system. It also contains a variety
of subdirectories for each type of file.</p>
<p>General build tools/files include: <b>acinclude.m4</b>, <b>autogen.sh</b>,
<b>configure.ac</b>, <b>Makefile.am</b>, <b>Make-rpm.mk</b>, <b>META</b>, <b>README</b>,
<b>slurm.spec.in</b>, and the contents of the <b>auxdir</b> directory. <span class="commandline">autoconf</span>
and <span class="commandline">make</span> commands are used to build and install
SLURM in an automated fashion. NOTE: <span class="commandline">autoconf</span>
version 2.52 or higher is required to build SLURM. Execute
<span class="commandline">autoconf -V</span> to check your version number.
The build process is described in the README file.
<p>Copyright and disclaimer information are in the files COPYING and DISCLAIMER.
All of the top-level subdirectories are described below.</p>
<p style="margin-left:.2in"><b>auxdir</b>&#151;Used for building SLURM.<br>
<b>contribs</b>&#151;Various contributed tools.<br>
<b>doc</b>&#151;Documentation including man pages. <br>
<b>etc</b>&#151;Sample configuration files.<br>
<b>slurm</b>&#151;Header files for API use. These files must be installed. Placing
these header files in this location makes for better code portability.<br>
<b>src</b>&#151;Contains all source code and header files not in the "slurm" subdirectory
described above.<br>
<b>testsuite</b>&#151;DejaGnu and Expect are used for testing all of its files
are here.</p>
<p class="footer"><a href="#top">top</a></p>
<h2>Documentation</h2>
<p>All of the documentation is in the subdirectory <b>doc</b>.
Two directories are of particular interest:</p>
<p style="margin-left:.2in">
<b>doc/man</b>&#151; contains the man pages for the APIs,
configuration file, commands, and daemons.<br>
<b>doc/html</b>&#151; contains the web pages.</p>
<h2>Source Code</h2>
<p>Functions are divided into several categories, each in its own subdirectory.
The details of each directory's contents are proved below. The directories are
as follows: </p>
<p style="margin-left:.2in">
<b>api</b>&#151;Application Program Interfaces into
the SLURM code. Used to send and get SLURM information from the central manager.
These are the functions user applications might utilize.<br>
<b>common</b>&#151;General purpose functions for widespread use throughout
SLURM.<br>
<b>database</b>&#151;Various database files that support the accounting
storage plugin.<br>
<b>plugins</b>&#151;Plugin functions for various infrastructures or optional
behavior. A separate subdirectory is used for each plugin class:<br>
<ul>
<li><b>accounting_storage</b> for specifying the type of storage for accounting,<br>
<li><b>auth</b> for user authentication,<br>
<li><b>checkpoint</b> for system-initiated checkpoint and restart of user jobs,<br>
<li><b>crypto</b> for cryptographic functions,<br>
<li><b>jobacct_gather</b> for job accounting,<br>
<li><b>jobcomp</b> for job completion logging,<br>
<li><b>mpi</b> for MPI support,<br>
<li><b>priority</b> calculates job priority based on a number of factors
including fair-share,<br>
<li><b>proctrack</b> for process tracking,<br>
<li><b>sched</b> for job scheduler,<br>
<li><b>select</b> for a job's node selection,<br>
<li><b>switch</b> for switch (interconnect) specific functions,<br>
<li><b>task</b> for task affinity to processors,<br>
<li><b>topology</b> methods for assigning nodes to jobs based on node
topology.<br>
</ul>
<p style="margin-left:.2in">
<b>sacct</b>&#151;User command to view accounting information about jobs.<br>
<b>sacctmgr</b>&#151;User and administrator tool to manage accounting.<br>
<b>salloc</b>&#151;User command to allocate resources for a job.<br>
<b>sattach</b>&#151;User command to attach standard input, output and error
files to a running job or job step.<br>
<b>sbatch</b>&#151;User command to submit a batch job (script for later execution).<br>
<b>sbcast</b>&#151;User command to broadcast a file to all nodes associated
with an existing SLURM job.<br>
<b>scancel</b>&#151;User command to cancel (or signal) a job or job step.<br>
<b>scontrol</b>&#151;Administrator tool to manage SLURM.<br>
<b>sinfo</b>&#151;User command to get information on SLURM nodes and partitions.<br>
<b>slurmctld</b>&#151;SLURM central manager daemon code.<br>
<b>slurmd</b>&#151;SLURM daemon code to manage the compute server nodes including
the execution of user applications.<br>
<b>slurmdbd</b>&#151;SLURM database daemon managing access to the accounting
storage database.<br>
<b>smap</b>&#151;User command to view layout of nodes, partitions, and jobs.
This is particularly valuable on systems like Bluegene, which has a three
dimension torus topography.<br>
<b>sprio</b>&#151;User command to see the breakdown of a job's priority
calculation when the Multifactor Job Priority plugin is installed.<br>
<b>squeue</b>&#151;User command to get information on SLURM jobs and job steps.<br>
<b>sreport</b>&#151;User command to view various reports about past
usage across the enterprise.<br>
<b>srun</b>&#151;User command to submit a job, get an allocation, and/or
initiation a parallel job step.<br>
<b>srun_cr</b>&#151;Checkpoint/Restart wrapper for srun.<br>
<b>sshare</b>&#151;User command to view shares and usage when the Multifactor
Job Priority plugin is installed.<br>
<b>sstat</b>&#151;User command to view detailed statistics about running
jobs when a Job Accounting Gather plugin is installed.<br>
<b>strigger</b>&#151;User and administrator tool to manage event triggers.<br>
<b>sview</b>&#151;User command to view and update node, partition, and job
job state information.<br>
<p class="footer"><a href="#top">top</a></p>
<h2>Configuration</h2>
<p>Sample configuration files are included in the <b>etc</b> subdirectory.
The <b>slurm.conf</b> can be built using a <a href="configurator.html">configuration tool</a>.
See <b>doc/man/man5/slurm.conf.5</b> and the man pages for other configuration files
for more details.
<b>init.d.slurm</b> is a script that determines which
SLURM daemon(s) should execute on any node based upon the configuration file contents.
It will also manage these daemons: starting, signalling, restarting, and stopping them.</p>
<h2>Test Suite</h2>
<p>The <b>testsuite</b> files use a DejaGnu framework for testing. These tests
are very limited in scope.</p>
<p>We also have a set of Expect SLURM tests available under the <b>testsuite/expect</b>
directory. These tests are executed after SLURM has been installed
and the daemons initiated. About 250 test scripts exercise all SLURM commands
and options including stress tests. The file <b>testsuite/expect/globals</b>
contains default paths and procedures for all of the individual tests. At
the very least, you will need to set the <i>slurm_dir</i> variable to the correct
value. To avoid conflicts with other developers, you can override variable settings
in a separate file named <b>testsuite/expect/globals.local</b>.</p>
<p>Set your working directory to <b>testsuite/expect</b> before
starting these tests. Tests may be executed individually by name
(e.g. <i>test1.1</i>)
or the full test suite may be executed with the single command <i>regression</i>.
See <b>testsuite/expect/README</b> for more information.</p>
<h2>Adding Files and Directories</h2>
<p>If you are adding files and directories to SLURM, it will be necessary to
re-build configuration files before executing the <b>configure</b> command.
Update <b>Makefile.am</b> files as needed then execute
<b>autogen.sh</b> before executing <b>configure</b>.
<h2>Tricks of the Trade</h2>
<h3>HAVE_FRONT_END</h3>
<p>You can make a single node appear to SLURM as a Linux cluster by running
<i>configure</i> with the <i>--enable-front-end</i> option. This
defines b>HAVE_FRONT_END</b> with a non-zero value in the file <b>config.h</b>.
All (fake) nodes should be defined in the <b>slurm.conf</b> file.
These nodes should be configured with a single <b>NodeAddr</b> value
indicating the node on which single <span class="commandline">slurmd</span> daemon
executes. Initiate one <span class="commandline">slurmd</span> and one
<span class="commandline">slurmctld</span> daemon. Do not initiate too many
simultaneous job steps to avoid overloading the
<span class="commandline">slurmd</span> daemon executing them all.</p>
<h3><a name="multiple_slurmd_support">Multiple slurmd support</a></h3>
<p>It is possible to run multiple slurmd daemons on a single node, each using
a different port number and NodeName alias. This is very useful for testing
networking and protocol changes, or anytime you want to simulate a larger
cluster than you really have. The author uses this on his desktop to simulate
multiple nodes. However, it is important to note that not all slurm functions
will work with multiple slurmd support enabled (e.g. many switch plugins will
not work, it is best to use switch/none).</p>
<p>Multiple support is enabled at configure-time with the
"--enable-multiple-slurmd" parameter. This enables a new parameter in the
slurm.conf file on the NodeName line, "Port=<port number>", and adds a new
command line parameter to slurmd, "-N".</p>
<p>Each slurmd needs to have its own NodeName, and its own TCP port number. Here
is an example of the NodeName lines for running three slurmd daemons on each
of ten nodes:</p>
<pre>
NodeName=foo[1-10] NodeHostname=host[1-10] Port=17001
NodeName=foo[11-20] NodeHostname=host[1-10] Port=17002
NodeName=foo[21-30] NodeHostname=host[1-10] Port=17003
</pre>
<p>
It is likely that you will also want to use the "%n" symbol in any slurmd
related paths in the slurm.conf file, for instance SlurmdLogFile,
SlurmdPidFile, and especially SlurmdSpoolDir. Each slurmd replaces the "%n"
with its own NodeName. Here is an example:</p>
<pre>
SlurmdLogFile=/var/log/slurm/slurmd.%n.log
SlurmdPidFile=/var/run/slurmd.%n.pid
SlurmdSpoolDir=/var/spool/slurmd.%n
</pre>
<p>
It is up to you to start each slurmd daemon with the proper NodeName.
For example, to start the slurmd daemons for host1 from the
above slurm.conf example:</p>
<pre>
host1> slurmd -N foo1
host1> slurmd -N foo11
host1> slurmd -N foo21
</pre>
<p class="footer"><a href="#top">top</a></p>
<p style="text-align:center;">Last modified 27 March 2009</p>
<!--#include virtual="footer.txt"-->