| <!--#include virtual="header.txt"--> |
| |
| <h1>Slurm User and Administrator Guide for Cray XC Systems</h1> |
| |
| <ul> |
| <li><a href="#user_guide">User Guide</a></li> |
| <li><a href="#features">Cray Specific Features</a></li> |
| <li><a href="#admin_guide">Administrator Guide</a></li> |
| <li><a href="#setup">Cray System Setup</a></li> |
| <li><a href="#ha">High Availability</a></li> |
| <li><a href="http://www.cray.com">Cray</a></li> |
| </ul> |
| |
| <HR SIZE=4> |
| |
| <h2 id="user_guide">User Guide<a class="slurm_link" href="#user_guide"></a></h2> |
| |
| <p>This document describes the unique features of Slurm on Cray XC |
| computers natively, or without the use of Cray's Application Level |
| Placement Scheduler (ALPS). You should be familiar with the Slurm's |
| mode of operation on Linux clusters before studying the differences |
| in Cray system operation described in this document. When running |
| Slurm in native mode a Cray system will function very similar to a |
| Linux cluster. |
| </p> |
| |
| <p>Slurm is designed to operate as a workload manager on Cray XC systems |
| (Cascade) without the use of ALPS. In addition to providing the same look and |
| feel of a regular Linux cluster this also allows for many functionalities |
| such as: |
| <ul> |
| <li>Ability to run multiple jobs per node</li> |
| <li>Ability to status running jobs with sstat</li> |
| <li>Full accounting support for job steps</li> |
| <li>Ability to run multiple jobs/steps in background from the same |
| session</li> |
| </ul> |
| </p> |
| |
| <h2 id="features">Cray Specific Features |
| <a class="slurm_link" href="#features"></a> |
| </h2> |
| <ul> |
| <li>Network Performance Counters</li> |
| <p> |
| To access Cray's Network Performance Counters (NPC) you can use |
| the <i>--network</i> option in sbatch/salloc/srun to request them. |
| There are 2 different types of counters, system and blade. |
| <p> |
| For the system option (--network=system) only one job can use system at |
| a time. Only nodes requested will be marked in use for the job |
| allocation. If the job does not fill up the entire system the rest |
| of the nodes are not able to be used by other jobs using NPC, if |
| idle their state will appear as PerfCnts. These nodes are still |
| available for other jobs not using NPC. |
| </p> |
| <p> |
| For the blade option (--network=blade) Only nodes requested |
| will be marked in use for the job allocation. If the job does not |
| fill up the entire blade(s) allocated to the job those blade(s) are not |
| able to be used by other jobs using NPC, if idle their state will appear as |
| PerfCnts. These nodes are still available for other jobs not using NPC. |
| </p> |
| |
| <li>Core Specialization</li> |
| <p> |
| To use set <b><i>CoreSpecPlugin=core_spec/cray_aries</i></b>. |
| Ability to reserve a number of cores allocated to the job for system |
| operations and not used by the application. The application will not |
| use these cores, but will be charged for their allocation. |
| </p> |
| </ul> |
| |
| <h2 id="admin_guide">Admin Guide |
| <a class="slurm_link" href="#admin_guide"></a> |
| </h2> |
| <p> |
| Many new plugins were added to utilize the Cray system without |
| ALPS. These should be set up in your slurm.conf outside of your |
| normal configuration. |
| <ul> |
| |
| <li>AcctGatherEnergyType</li> |
| <p> |
| Set <b><i>AcctGatherEnergyType=acct_gather_energy/pm_counters</i></b> to |
| have the Cray XC baseboard management controller report energy usage |
| data to Slurm. |
| </p> |
| |
| <li>BurstBuffer</li> |
| <p> |
| Set <b><i>BurstBufferPlugins=burst_buffer/datawarp</i></b> to use. |
| The burst buffer capability on Cray systems is also known by the name |
| <i>DataWarp</i>. |
| For more information, see |
| <a href="burst_buffer.html">Slurm Burst Buffer Guide</a>. |
| </p> |
| |
| <li>CoreSpec</li> |
| <p> |
| To use set <b><i>CoreSpecPlugin=core_spec/cray_aries</i></b>. |
| </p> |
| |
| <li>JobSubmit</li> |
| <p> |
| Set <b><i>JobSubmitPlugins=job_submit/cray_aries</i></b> to use. |
| This plugin is primarily used to set a gres=craynetwork value which |
| is used to limit the number of applications that can run on a node |
| at once. That number can be at most 4. This craynetwork gres needs |
| to be set up in your slurm.conf to ensure proper functionality. |
| For example... |
| <pre> |
| ... |
| Grestypes=craynetwork |
| NodeName=nid000[00-10] gres=craynetwork:4 |
| ... |
| </pre> |
| </p> |
| |
| <li>Power</li> |
| <p> |
| Set <b><i>PowerPlugin=power/cray_aries</i></b> to use. |
| <b><i>PowerParameters</i></b> is also typically configured. |
| For more information, see |
| <a href="power_mgmt.html">Slurm Power Management Guide</a>. |
| </p> |
| |
| <li>Proctrack</li> |
| <p> |
| Set <b><i>ProctrackType=proctrack/cray_aries</i></b> to use. |
| </p> |
| |
| <li>Select</li> |
| <p> |
| Set <b><i>SelectType=select/cray_aries</i></b> to use. This plugin is |
| a layered plugin. Which means it enhances a lower layer select |
| plugin. By default it is layered on top of the <i>select/linear</i> |
| plugin. It can also be layered on top of the <i>select/cons_tres</i> plugin |
| by using the <b><i>SelectTypeParameters=other_cons_tres</i></b>, |
| doing this will allow you to run multiple jobs on a Cray node just |
| like on a normal Linux cluster. |
| Use additional <b><i>SelectTypeParameters</i></b> to identify the resources |
| to allocate (e.g. cores, sockets, memory, etc.). See the slurm.conf man |
| page for details. |
| </p> |
| |
| <li>SlurmctldPort, SlurmdPort, SrunPortRange</li> |
| <p> |
| Realm-Specific IP Addressing (RSIP) will automatically try to interact with |
| anything opened on ports 8192 to 60000. Configure SlurmctldPort, SlurmdPort, |
| and SrunPortRange to use ports above 60000. In the case of SrunPortRange, |
| making 1000 or more ports available is recommended. |
| </p> |
| |
| <li>Switch</li> |
| <p> |
| Set <b><i>SwitchType=switch/cray_aries</i></b> to use. |
| </p> |
| |
| <li>Task</li> |
| <p> |
| Set <b><i>TaskPlugin=cray_aries,cgroup</i></b> to use. Use of the |
| <b><i>task/cgroup</i></b> plugin is required alongside <i>task/cray_aries</i>. |
| You may also use the <i>task/affinity</i> plugin along with |
| <i>task/cray_aries,task/cgroup</i> if desired (i.e. |
| <b><i>TaskPlugin=cray_aries,affinity,cgroup</i></b>). Note that plugins |
| are used in the order they are defined in the comma separated list, |
| and that <i>task/cray_aries</i> must be listed before <i>task/cgroup</i> |
| due to internal dependencies between the two plugins. |
| </p> |
| </ul> |
| |
| <h2 id="setup">Cray system setup<a class="slurm_link" href="#setup"></a></h2> |
| <p>Some Slurm plugins (burst_buffer/datawarp and power/cray_aries) plugins |
| parse JSON format data. |
| These plugins are designed to make use of the JSON-C library for this purpose. |
| See <a href="download.html#json">JSON-C installation instructions</a> for |
| details.</p> |
| |
| <p> |
| Some services on the system need to be set up to run correctly with |
| Slurm. Below is how to restart the service and the nodes they run |
| on. It is probably a good idea to set this up to happen automatically. |
| <ul> |
| <li>boot node</li> |
| <ul> |
| <li>WLM_DETECT_ACTIVE=SLURM /etc/init.d/aeld restart</li> |
| </ul> |
| <li>sdb node</li> |
| <ul> |
| <li>WLM_DETECT_ACTIVE=SLURM /etc/init.d/ncmd restart</li> |
| <lI>WLM_DETECT_ACTIVE=SLURM /etc/init.d/apptermd restart</li> |
| </ul> |
| </ul> |
| <p> |
| As with Linux clusters you will need to start a slurmd on each of your |
| compute nodes. If you choose to use munge authentication, advised, |
| you will also need munge installed and a munged running on each of |
| your compute nodes as well. See the <a href="quickstart_admin.html"> |
| quick start guide</a> for more info. Outside of the differences |
| listed in this file it can be used to set up your Cray system to run |
| Slurm natively. |
| </p> |
| <p> |
| On larger systems, you may wish to set the PMI_MMAP_SYNC_WAIT_TIME |
| environment variable in your users' profiles to a larger value than |
| the default (180 seconds) to prevent PMI from falsely detecting job |
| launch failures. |
| </p> |
| <h2 id="ha">High Availability<a class="slurm_link" href="#ha"></a></h2> |
| <p> |
| A backup controller can be setup in or outside the Cray. However, when the |
| backup is within the Cray, both the primary and the backup controllers will go |
| down when the Cray is rebooted. It is best to setup the backup controller on a |
| Cray external node so that the controller can still receive new jobs when the |
| Cray is down. When the backup is configured on an external node the |
| <b><i>no_backup_scheduling</i></b> <b><i>SchedulerParameter</i></b> should be |
| specified in the slurm.conf. This allows new jobs to be submitted while the Cray |
| is down and prevents any new jobs from being started. |
| </p> |
| |
| |
| <p style="text-align:center;">Last modified 27 July 2023</p> |
| |
| <!--#include virtual="footer.txt"--> |