| <!--#include virtual="header.txt"--> |
| |
| <h1>High Throughput Computing Administration Guide</h1> |
| |
| <p>This document contains SLURM administrator information specifically |
| for high throughput computing, namely the execution of many short jobs. |
| Getting optimal performance for high throughput computing does require |
| some tuning and this document should help you off to a good start. |
| A working knowledge of SLURM should be considered a prerequisite |
| for this material.</p> |
| |
| <h2>Performance Results</h2> |
| |
| <p>SLURM has also been validated to process 100,000 jobs and job steps per hour |
| on a sustained basis with short bursts of activity at a much higher level. |
| Actual performance depends upon the jobs to be executed plus the hardware and |
| configuration used.</p> |
| |
| <h2>System configuration</h2> |
| |
| <p>Three system configuration parameters must be set to support a large number |
| of open files and TCP connections with large bursts of messages. Changes can |
| be made using the <b>/etc/rc.d/rc.local</b> or <b>/etc/sysctl.conf</b> |
| script to preserve changes after reboot. In either case, you can write values |
| directly into these files |
| (e.g. <i>"echo 32832 > /proc/sys/fs/file-max"</i>).</p> |
| <ul> |
| <li><b>/proc/sys/fs/file-max</b>: |
| The maximum number of concurrently open files. |
| We recommend a limit of at least 32,832.</li> |
| <li><b>/proc/sys/net/ipv4/tcp_max_syn_backlog</b>: |
| Maximum number of remembered connection requests, which are still did not |
| receive an acknowledgment from connecting client. |
| The default value is 1024 for systems with more than 128Mb of memory, and 128 |
| for low memory machines. If server suffers of overload, try to increase this |
| number.</li> |
| <li><b>/proc/sys/net/core/somaxconn</b>: |
| Limit of socket listen() backlog, known in userspace as SOMAXCONN. Defaults to |
| 128. The value should be raised substantially to support bursts of request. |
| For example, to support a burst of 1024 requests, set somaxconn to 1024.</li> |
| </ul> |
| |
| <p>The transmit queue length (<b>txqueuelen</b>) may also need to be modified |
| using the ifconfig command. A value of 4096 has been found to work well for one |
| site with a very large cluster |
| (e.g. <i>"ifconfig <interface> txqueuelen 4096"</i>).</p> |
| |
| <h2>User limits</h2> |
| |
| <p>The <b>ulimit</b> values in effect for the <b>slurmctld</b> daemon should |
| be set quite high for memory size, open file count and stack size.</p> |
| |
| <h2>SLURM Configuration</h2> |
| |
| <p>Several SLURM configuration parameters should be adjusted to |
| reflect the needs of high throughput computing.</p> |
| |
| <ul> |
| <li><b>MaxJobCount</b>: |
| Controls how many jobs may be in the <b>slurmctld</b> daemon records at any |
| point in time (pending, running, suspended or completed[temporarily]). |
| The default value is 10,000</li> |
| <li><b>MessageTimeout</b>: |
| Controls how long to wait for a response to messages. |
| The default value is 10 seconds. |
| While the <b>slurmctld</b> daemon is highly threaded, its responsiveness |
| is load dependent. This value might need to be increased somewhat.</li> |
| <li><b>MinJobAge</b>: |
| Controls how soon the record of a completed job can be purged from the |
| <b>slurmctld</b> memory and thus not visible using the <b>squeue</b> command. |
| The record of jobs run will be preserved in accounting records and logs. |
| The default value is 300 seconds. The value should be reduced to a few |
| seconds if possible.</li> |
| <li><b>SchedulerParameters</b>: |
| Several scheduling parameters are available. |
| <ul> |
| <li>Setting option <b>defer</b> will avoid attempting to schedule each job |
| individually at job submit time, but defer it until a later time when |
| scheduling multiple jobs simultaneously may be possible. |
| This option may improve system responsiveness when large numbers of jobs |
| (many hundreds) are submitted at the same time, but it will delay the |
| initiation time of individual jobs.</li> |
| <li>A variation of <b>defer</b> would be to configure <b>default_queue_depth</b> |
| to a relatively small number to avoid attempting to schedule large numbers of |
| jobs every time some job completes or another routine action occurs. (NOTE: |
| the default value of <b>default_queue_depth</b> should be fine in most |
| cases).</li> |
| <li>The <i>sched/backfill</i> plugin has relatively high overhead if used with |
| large numbers of job. Configuring <b>max_job_bf</b> to a modest size (say 100 |
| jobs or less) and <b>interval</b> to 30 seconds or more will limit the |
| overhead of backfill scheduling (NOTE: the default values are fine for both |
| of these parameters).</li> |
| </ul></li> |
| <li><b>SlurmctldPort</b>: |
| It is desirable to configure the <b>slurmctld</b> daemon to accept incoming |
| messages on more than one port in order to avoid having incoming messages |
| discarded by the operating system due to exceeding the SOMAXCONN limit |
| described above. Using between two and ten ports is suggested when large |
| numbers of simultaneous requests are to be supported.</li> |
| <li>Other: Configure logging, accounting and other overhead to a minimum |
| appropriate for your environment.</li> |
| </ul> |
| |
| <p style="text-align:center;">Last modified 30 August 2010</p> |
| |
| <!--#include virtual="footer.txt"--> |