blob: ba54cc2c4594b3a6fe7eeecd52ca72522a0c889d [file] [log] [blame]
<!--#include virtual="header.txt"-->
<h1>Slurm User and Administrator Guide for Cray XC Systems</h1>
<ul>
<li><a href="#user_guide">User Guide</a></li>
<li><a href="#features">Cray Specific Features</a></li>
<li><a href="#admin_guide">Administrator Guide</a></li>
<li><a href="#setup">Cray System Setup</a></li>
<li><a href="#ha">High Availability</a></li>
<li><a href="http://www.cray.com">Cray</a></li>
</ul>
<HR SIZE=4>
<h2 id="user_guide">User Guide<a class="slurm_link" href="#user_guide"></a></h2>
<p>This document describes the unique features of Slurm on Cray XC
computers natively, or without the use of Cray's Application Level
Placement Scheduler (ALPS). You should be familiar with the Slurm's
mode of operation on Linux clusters before studying the differences
in Cray system operation described in this document. When running
Slurm in native mode a Cray system will function very similar to a
Linux cluster.
</p>
<p>Slurm is designed to operate as a workload manager on Cray XC systems
(Cascade) without the use of ALPS. In addition to providing the same look and
feel of a regular Linux cluster this also allows for many functionalities
such as:
<ul>
<li>Ability to run multiple jobs per node</li>
<li>Ability to status running jobs with sstat</li>
<li>Full accounting support for job steps</li>
<li>Ability to run multiple jobs/steps in background from the same
session</li>
</ul>
</p>
<h2 id="features">Cray Specific Features
<a class="slurm_link" href="#features"></a>
</h2>
<ul>
<li>Network Performance Counters</li>
<p>
To access Cray's Network Performance Counters (NPC) you can use
the <i>--network</i> option in sbatch/salloc/srun to request them.
There are 2 different types of counters, system and blade.
<p>
For the system option (--network=system) only one job can use system at
a time. Only nodes requested will be marked in use for the job
allocation. If the job does not fill up the entire system the rest
of the nodes are not able to be used by other jobs using NPC, if
idle their state will appear as PerfCnts. These nodes are still
available for other jobs not using NPC.
</p>
<p>
For the blade option (--network=blade) Only nodes requested
will be marked in use for the job allocation. If the job does not
fill up the entire blade(s) allocated to the job those blade(s) are not
able to be used by other jobs using NPC, if idle their state will appear as
PerfCnts. These nodes are still available for other jobs not using NPC.
</p>
<li>Core Specialization</li>
<p>
To use set <b><i>CoreSpecPlugin=core_spec/cray_aries</i></b>.
Ability to reserve a number of cores allocated to the job for system
operations and not used by the application. The application will not
use these cores, but will be charged for their allocation.
</p>
</ul>
<h2 id="admin_guide">Admin Guide
<a class="slurm_link" href="#admin_guide"></a>
</h2>
<p>
Many new plugins were added to utilize the Cray system without
ALPS. These should be set up in your slurm.conf outside of your
normal configuration.
<ul>
<li>AcctGatherEnergyType</li>
<p>
Set <b><i>AcctGatherEnergyType=acct_gather_energy/pm_counters</i></b> to
have the Cray XC baseboard management controller report energy usage
data to Slurm.
</p>
<li>BurstBuffer</li>
<p>
Set <b><i>BurstBufferPlugins=burst_buffer/datawarp</i></b> to use.
The burst buffer capability on Cray systems is also known by the name
<i>DataWarp</i>.
For more information, see
<a href="burst_buffer.html">Slurm Burst Buffer Guide</a>.
</p>
<li>CoreSpec</li>
<p>
To use set <b><i>CoreSpecPlugin=core_spec/cray_aries</i></b>.
</p>
<li>JobSubmit</li>
<p>
Set <b><i>JobSubmitPlugins=job_submit/cray_aries</i></b> to use.
This plugin is primarily used to set a gres=craynetwork value which
is used to limit the number of applications that can run on a node
at once. That number can be at most 4. This craynetwork gres needs
to be set up in your slurm.conf to ensure proper functionality.
For example...
<pre>
...
Grestypes=craynetwork
NodeName=nid000[00-10] gres=craynetwork:4
...
</pre>
</p>
<li>Power</li>
<p>
Set <b><i>PowerPlugin=power/cray_aries</i></b> to use.
<b><i>PowerParameters</i></b> is also typically configured.
For more information, see
<a href="power_mgmt.html">Slurm Power Management Guide</a>.
</p>
<li>Proctrack</li>
<p>
Set <b><i>ProctrackType=proctrack/cray_aries</i></b> to use.
</p>
<li>Select</li>
<p>
Set <b><i>SelectType=select/cray_aries</i></b> to use. This plugin is
a layered plugin. Which means it enhances a lower layer select
plugin. By default it is layered on top of the <i>select/linear</i>
plugin. It can also be layered on top of the <i>select/cons_tres</i> plugin
by using the <b><i>SelectTypeParameters=other_cons_tres</i></b>,
doing this will allow you to run multiple jobs on a Cray node just
like on a normal Linux cluster.
Use additional <b><i>SelectTypeParameters</i></b> to identify the resources
to allocate (e.g. cores, sockets, memory, etc.). See the slurm.conf man
page for details.
</p>
<li>SlurmctldPort, SlurmdPort, SrunPortRange</li>
<p>
Realm-Specific IP Addressing (RSIP) will automatically try to interact with
anything opened on ports 8192 to 60000. Configure SlurmctldPort, SlurmdPort,
and SrunPortRange to use ports above 60000. In the case of SrunPortRange,
making 1000 or more ports available is recommended.
</p>
<li>Switch</li>
<p>
Set <b><i>SwitchType=switch/cray_aries</i></b> to use.
</p>
<li>Task</li>
<p>
Set <b><i>TaskPlugin=cray_aries,cgroup</i></b> to use. Use of the
<b><i>task/cgroup</i></b> plugin is required alongside <i>task/cray_aries</i>.
You may also use the <i>task/affinity</i> plugin along with
<i>task/cray_aries,task/cgroup</i> if desired (i.e.
<b><i>TaskPlugin=cray_aries,affinity,cgroup</i></b>). Note that plugins
are used in the order they are defined in the comma separated list,
and that <i>task/cray_aries</i> must be listed before <i>task/cgroup</i>
due to internal dependencies between the two plugins.
</p>
</ul>
<h2 id="setup">Cray system setup<a class="slurm_link" href="#setup"></a></h2>
<p>Some Slurm plugins (burst_buffer/datawarp and power/cray_aries) plugins
parse JSON format data.
These plugins are designed to make use of the JSON-C library for this purpose.
See <a href="download.html#json">JSON-C installation instructions</a> for
details.</p>
<p>
Some services on the system need to be set up to run correctly with
Slurm. Below is how to restart the service and the nodes they run
on. It is probably a good idea to set this up to happen automatically.
<ul>
<li>boot node</li>
<ul>
<li>WLM_DETECT_ACTIVE=SLURM /etc/init.d/aeld restart</li>
</ul>
<li>sdb node</li>
<ul>
<li>WLM_DETECT_ACTIVE=SLURM /etc/init.d/ncmd restart</li>
<lI>WLM_DETECT_ACTIVE=SLURM /etc/init.d/apptermd restart</li>
</ul>
</ul>
<p>
As with Linux clusters you will need to start a slurmd on each of your
compute nodes. If you choose to use munge authentication, advised,
you will also need munge installed and a munged running on each of
your compute nodes as well. See the <a href="quickstart_admin.html">
quick start guide</a> for more info. Outside of the differences
listed in this file it can be used to set up your Cray system to run
Slurm natively.
</p>
<p>
On larger systems, you may wish to set the PMI_MMAP_SYNC_WAIT_TIME
environment variable in your users' profiles to a larger value than
the default (180 seconds) to prevent PMI from falsely detecting job
launch failures.
</p>
<h2 id="ha">High Availability<a class="slurm_link" href="#ha"></a></h2>
<p>
A backup controller can be setup in or outside the Cray. However, when the
backup is within the Cray, both the primary and the backup controllers will go
down when the Cray is rebooted. It is best to setup the backup controller on a
Cray external node so that the controller can still receive new jobs when the
Cray is down. When the backup is configured on an external node the
<b><i>no_backup_scheduling</i></b> <b><i>SchedulerParameter</i></b> should be
specified in the slurm.conf. This allows new jobs to be submitted while the Cray
is down and prevents any new jobs from being started.
</p>
<p style="text-align:center;">Last modified 27 July 2023</p>
<!--#include virtual="footer.txt"-->