| <!--#include virtual="header.txt"--> |
| |
| <h1>Large Cluster Administration Guide</h1> |
| |
| <p>This document contains SLURM administrator information specifically |
| for clusters containing 1,024 nodes or more. |
| Virtually all SLURM components have been validated (through emulation) |
| for clusters containing up to 16,384 compute nodes. |
| Getting good performance at that scale does require some tuning and |
| this document should help you off to a good start. |
| A working knowledge of SLURM should be considered a prerequisite |
| for this material.</p> |
| |
| <h2>Node Selection Plugin (SelectType)</h2> |
| |
| <p>While allocating individual processors within a node is great |
| for smaller clusters, the overhead of keeping track of the individual |
| processors and memory within each node adds significant overhead. |
| For best scalability, the consumable resource plugin (<i>select/cons_res</i>) |
| is best avoided.</p> |
| |
| <h2>Job Accounting Plugin (JobAcctType)</h2> |
| |
| <p>Job accounting relies upon the <i>slurmstepd</i> daemon on each compute |
| node periodically sampling data. |
| This data collection will take compute cycles away from the application |
| inducing what is known as <i>system noise</i>. |
| For large parallel applications, this system noise can detract for |
| application scalability. |
| For optimal application performance, disabling job accounting |
| is best (<i>jobacct/none</i>). |
| Consider use of job completion records (<i>JobCompType</i>) for accounting |
| purposes as this entails far less overhead. |
| If job accounting is required, configure the sampling interval |
| to a relatively large size (e.g. <i>JobAcctFrequency=300</i>). |
| Some experimentation may also be required to deal with collisions |
| on data transmission.</p> |
| |
| <h2>Node Configuration</h2> |
| |
| <p>While SLURM can track the amount of memory and disk space actually found |
| on each compute node and use it for scheduling purposes, this entails |
| extra overhead. |
| Optimize performance by specifying the expected configuration using |
| the available parameters (<i>RealMemory</i>, <i>Procs</i>, and |
| <i>TmpDisk</i>). |
| If the node is found to contain less resources than configured, |
| it will be marked DOWN and not used. |
| Also set the <i>FastSchedule</i> parameter. |
| While SLURM can easily handle a heterogeneous cluster, configuring |
| the nodes using the minimal number of lines in <i>slurm.conf</i> |
| will both make for easier administration and better performance.</p> |
| |
| <h2>Timers</h2> |
| |
| <p>The configuration parameter <i>SlurmdTimeout</i> determines the interval |
| at which <i>slurmctld</i> routinely communicates with <i>slurmd</i>. |
| Communications occur at half the <i>SlurmdTimeout</i> value. |
| The purpose of this is to determine when a compute node fails |
| and thus should not be allocated work. |
| Longer intervals decrease system noise on compute nodes (we do |
| synchronize these requests across the cluster, but there will |
| be some impact upon applications). |
| For really large clusters, <i>SlurmdTimeoutl</i> values of |
| 120 seconds or more are reasonable.</p> |
| |
| <p style="text-align:center;">Last modified 28 January 2006</p> |
| |
| <!--#include virtual="footer.txt"--> |