doc/html/high_throughput.shtml - SchedMD/slurm - Git at Google

 <!--#include virtual="header.txt"-->

 <h1>High Throughput Computing Administration Guide</h1>

 <p>This document contains SLURM administrator information specifically
 for high throughput computing, namely the execution of many short jobs.
 Getting optimal performance for high throughput computing does require
 some tuning and this document should help you off to a good start.
 A working knowledge of SLURM should be considered a prerequisite
 for this material.</p>

 <h2>Performance Results</h2>

 <p>SLURM has also been validated to process 100,000 jobs and job steps per hour
 on a sustained basis with short bursts of activity at a much higher level.
 Actual performance depends upon the jobs to be executed plus the hardware and
 configuration used.</p>

 <h2>System configuration</h2>

 <p>Three system configuration parameters must be set to support a large number
 of open files and TCP connections with large bursts of messages. Changes can
 be made using the <b>/etc/rc.d/rc.local</b> or <b>/etc/sysctl.conf</b>
 script to preserve changes after reboot. In either case, you can write values
 directly into these files
 (e.g. <i>"echo 32832 &gt; /proc/sys/fs/file-max"</i>).</p>
 <ul>
 <li><b>/proc/sys/fs/file-max</b>:
 The maximum number of concurrently open files.
 We recommend a limit of at least 32,832.</li>
 <li><b>/proc/sys/net/ipv4/tcp_max_syn_backlog</b>:
 Maximum number of remembered connection requests, which are still did not
 receive an acknowledgment from connecting client.
 The default value is 1024 for systems with more than 128Mb of memory, and 128
 for low memory machines. If server suffers of overload, try to increase this
 number.</li>
 <li><b>/proc/sys/net/core/somaxconn</b>:
 Limit of socket listen() backlog, known in userspace as SOMAXCONN. Defaults to
 128. The value should be raised substantially to support bursts of request.
 For example, to support a burst of 1024 requests, set somaxconn to 1024.</li>
 </ul>

 <p>The transmit queue length (<b>txqueuelen</b>) may also need to be modified
 using the ifconfig command. A value of 4096 has been found to work well for one
 site with a very large cluster
 (e.g. <i>"ifconfig <interface> txqueuelen 4096"</i>).</p>

 <h2>User limits</h2>

 <p>The <b>ulimit</b> values in effect for the <b>slurmctld</b> daemon should
 be set quite high for memory size, open file count and stack size.</p>

 <h2>SLURM Configuration</h2>

 <p>NOTE: Substantial changes were made in SLURM version 2.4 to support higher
 throughput rates. Version 2.5 includes more enhancements.</p>

 <p>Several SLURM configuration parameters should be adjusted to
 reflect the needs of high throughput computing.</p>

 <ul>
 <li><b>MaxJobCount</b>:
 Controls how many jobs may be in the <b>slurmctld</b> daemon records at any
 point in time (pending, running, suspended or completed[temporarily]).
 The default value is 10,000</li>
 <li><b>MessageTimeout</b>:
 Controls how long to wait for a response to messages.
 The default value is 10 seconds.
 While the <b>slurmctld</b> daemon is highly threaded, its responsiveness
 is load dependent. This value might need to be increased somewhat.</li>
 <li><b>MinJobAge</b>:
 Controls how soon the record of a completed job can be purged from the
 <b>slurmctld</b> memory and thus not visible using the <b>squeue</b> command.
 The record of jobs run will be preserved in accounting records and logs.
 The default value is 300 seconds. The value should be reduced to a few
 seconds if possible.</li>
 <li><b>PriorityType</b>:
 The <b>priority/builtin</b> is considerably faster than other options, but
 schedules jobs only on a First In First Out (FIFO) basis.</li>
 <li><b>SchedulerParameters</b>:
 Several scheduling parameters are available.
 <ul>
 <li>Setting option <b>defer</b> will avoid attempting to schedule each job
 individually at job submit time, but defer it until a later time when
 scheduling multiple jobs simultaneously may be possible.
 This option may improve system responsiveness when large numbers of jobs
 (many hundreds) are submitted at the same time, but it will delay the
 initiation time of individual jobs.</li>
 <li>A variation of <b>defer</b> would be to configure <b>default_queue_depth</b>
 to a relatively small number to avoid attempting to schedule large numbers of
 jobs every time some job completes or another routine action occurs. (NOTE:
 the default value of <b>default_queue_depth</b> should be fine in most
 cases).</li>
 <li>The <i>sched/backfill</i> plugin has relatively high overhead if used with
 large numbers of job. Configuring <b>max_job_bf</b> to a modest size (say 100
 jobs or less) and <b>interval</b> to 30 seconds or more will limit the
 overhead of backfill scheduling (NOTE: the default values are fine for both
 of these parameters).</li>
 </ul></li>
 <li><b>SelectType</b>:
 The <b>select/serial</b> plugin is highly optimized if executing only serial
 (single CPU) jobs.</li>
 <li><b>SlurmctldPort</b>:
 It is desirable to configure the <b>slurmctld</b> daemon to accept incoming
 messages on more than one port in order to avoid having incoming messages
 discarded by the operating system due to exceeding the SOMAXCONN limit
 described above. Using between two and ten ports is suggested when large
 numbers of simultaneous requests are to be supported.</li>
 <li><b>SlurmctldDebug</b>:
 More detailed logging will decrease system throughput. Set to 2 (log errors
 only) or 3 (general information logging). Each increment in the logging level
 will increase the number of message by a factor of about 3.</li>
 <li><b>SlurmdDebug</b>:
 More detailed logging will decrease system throughput. Set to 2 (log errors
 only) or 3 (general information logging). Each increment in the logging level
 will increase the number of message by a factor of about 3.</li>
 <li><b>SlurmdLogFile</b>:
 Writing to local storage is recommended.</li>
 <li>Other: Configure logging, accounting and other overhead to a minimum
 appropriate for your environment.</li>
 </ul>

 <p style="text-align:center;">Last modified 12 July 2012</p>

 <!--#include virtual="footer.txt"-->
	<!--#include virtual="header.txt"-->

	<h1>High Throughput Computing Administration Guide</h1>

	<p>This document contains SLURM administrator information specifically
	for high throughput computing, namely the execution of many short jobs.
	Getting optimal performance for high throughput computing does require
	some tuning and this document should help you off to a good start.
	A working knowledge of SLURM should be considered a prerequisite
	for this material.</p>

	<h2>Performance Results</h2>

	<p>SLURM has also been validated to process 100,000 jobs and job steps per hour
	on a sustained basis with short bursts of activity at a much higher level.
	Actual performance depends upon the jobs to be executed plus the hardware and
	configuration used.</p>

	<h2>System configuration</h2>

	<p>Three system configuration parameters must be set to support a large number
	of open files and TCP connections with large bursts of messages. Changes can
	be made using the <b>/etc/rc.d/rc.local</b> or <b>/etc/sysctl.conf</b>
	script to preserve changes after reboot. In either case, you can write values
	directly into these files
	(e.g. <i>"echo 32832 > /proc/sys/fs/file-max"</i>).</p>
	<ul>
	<li><b>/proc/sys/fs/file-max</b>:
	The maximum number of concurrently open files.
	We recommend a limit of at least 32,832.</li>
	<li><b>/proc/sys/net/ipv4/tcp_max_syn_backlog</b>:
	Maximum number of remembered connection requests, which are still did not
	receive an acknowledgment from connecting client.
	The default value is 1024 for systems with more than 128Mb of memory, and 128
	for low memory machines. If server suffers of overload, try to increase this
	number.</li>
	<li><b>/proc/sys/net/core/somaxconn</b>:
	Limit of socket listen() backlog, known in userspace as SOMAXCONN. Defaults to
	128. The value should be raised substantially to support bursts of request.
	For example, to support a burst of 1024 requests, set somaxconn to 1024.</li>
	</ul>

	<p>The transmit queue length (<b>txqueuelen</b>) may also need to be modified
	using the ifconfig command. A value of 4096 has been found to work well for one
	site with a very large cluster
	(e.g. <i>"ifconfig <interface> txqueuelen 4096"</i>).</p>

	<h2>User limits</h2>

	<p>The <b>ulimit</b> values in effect for the <b>slurmctld</b> daemon should
	be set quite high for memory size, open file count and stack size.</p>

	<h2>SLURM Configuration</h2>

	<p>NOTE: Substantial changes were made in SLURM version 2.4 to support higher
	throughput rates. Version 2.5 includes more enhancements.</p>

	<p>Several SLURM configuration parameters should be adjusted to
	reflect the needs of high throughput computing.</p>

	<ul>
	<li><b>MaxJobCount</b>:
	Controls how many jobs may be in the <b>slurmctld</b> daemon records at any
	point in time (pending, running, suspended or completed[temporarily]).
	The default value is 10,000</li>
	<li><b>MessageTimeout</b>:
	Controls how long to wait for a response to messages.
	The default value is 10 seconds.
	While the <b>slurmctld</b> daemon is highly threaded, its responsiveness
	is load dependent. This value might need to be increased somewhat.</li>
	<li><b>MinJobAge</b>:
	Controls how soon the record of a completed job can be purged from the
	<b>slurmctld</b> memory and thus not visible using the <b>squeue</b> command.
	The record of jobs run will be preserved in accounting records and logs.
	The default value is 300 seconds. The value should be reduced to a few
	seconds if possible.</li>
	<li><b>PriorityType</b>:
	The <b>priority/builtin</b> is considerably faster than other options, but
	schedules jobs only on a First In First Out (FIFO) basis.</li>
	<li><b>SchedulerParameters</b>:
	Several scheduling parameters are available.
	<ul>
	<li>Setting option <b>defer</b> will avoid attempting to schedule each job
	individually at job submit time, but defer it until a later time when
	scheduling multiple jobs simultaneously may be possible.
	This option may improve system responsiveness when large numbers of jobs
	(many hundreds) are submitted at the same time, but it will delay the
	initiation time of individual jobs.</li>
	<li>A variation of <b>defer</b> would be to configure <b>default_queue_depth</b>
	to a relatively small number to avoid attempting to schedule large numbers of
	jobs every time some job completes or another routine action occurs. (NOTE:
	the default value of <b>default_queue_depth</b> should be fine in most
	cases).</li>
	<li>The <i>sched/backfill</i> plugin has relatively high overhead if used with
	large numbers of job. Configuring <b>max_job_bf</b> to a modest size (say 100
	jobs or less) and <b>interval</b> to 30 seconds or more will limit the
	overhead of backfill scheduling (NOTE: the default values are fine for both
	of these parameters).</li>
	</ul></li>
	<li><b>SelectType</b>:
	The <b>select/serial</b> plugin is highly optimized if executing only serial
	(single CPU) jobs.</li>
	<li><b>SlurmctldPort</b>:
	It is desirable to configure the <b>slurmctld</b> daemon to accept incoming
	messages on more than one port in order to avoid having incoming messages
	discarded by the operating system due to exceeding the SOMAXCONN limit
	described above. Using between two and ten ports is suggested when large
	numbers of simultaneous requests are to be supported.</li>
	<li><b>SlurmctldDebug</b>:
	More detailed logging will decrease system throughput. Set to 2 (log errors
	only) or 3 (general information logging). Each increment in the logging level
	will increase the number of message by a factor of about 3.</li>
	<li><b>SlurmdDebug</b>:
	More detailed logging will decrease system throughput. Set to 2 (log errors
	only) or 3 (general information logging). Each increment in the logging level
	will increase the number of message by a factor of about 3.</li>
	<li><b>SlurmdLogFile</b>:
	Writing to local storage is recommended.</li>
	<li>Other: Configure logging, accounting and other overhead to a minimum
	appropriate for your environment.</li>
	</ul>

	<p style="text-align:center;">Last modified 12 July 2012</p>

	<!--#include virtual="footer.txt"-->