doc/html/resource_limits.shtml - SchedMD/slurm - Git at Google

 <!--#include virtual="header.txt"-->

 <h1>Resource Limits</h1>

 <p>SLURM scheduling policy support was significantly changed
 in version 2.0 in order to take advantage of the database
 integration used for storing accounting information.
 This document describes the capabilities available in
 SLURM version 2.0.
 New features are under active development.
 Familiarity with SLURM's <a href="accounting.html">Accounting</a> web page
 is strongly recommended before use of this document.</p>

 <p>Note for users of Maui or Moab schedulers: <br>
 Maui and Moab are not integrated with SLURM's resource limits,
 but should use their own resource limits mechanisms.</p>

 <h2>Configuration</h2>

 <p>Scheduling policy information must be stored in a database
 as specified by the <b>AccountingStorageType</b> configuration parameter
 in the <b>slurm.conf</b> configuration file.
 Information can be recorded in either <a href="http://www.mysql.com/">MySQL</a>
 or <a href="http://www.postgresql.org/">PostgreSQL</a>.
 For security and performance reasons, the use of
 SlurmDBD (SLURM Database Daemon) as a front-end to the
 database is strongly recommended.
 SlurmDBD uses a SLURM authentication plugin (e.g. MUNGE).
 SlurmDBD also uses an existing SLURM accounting storage plugin
 to maximize code reuse.
 SlurmDBD uses data caching and prioritization of pending requests
 in order to optimize performance.
 While SlurmDBD relies upon existing SLURM plugins for authentication
 and database use, the other SLURM commands and daemons are not required
 on the host where SlurmDBD is installed.
 Only the <i>slurmdbd</i> and <i>slurm-plugins</i> RPMs are required
 for SlurmDBD execution.</p>

 <p>Both accounting and scheduling policy are configured based upon
 an <i>association</i>. An <i>association</i> is a 4-tuple consisting
 of the cluster name, bank account, user and (optionally) the SLURM
 partition.
 In order to enforce scheduling policy, set the value of
 <b>AccountingStorageEnforce</b>:
 This option contains a comma separated list of options you may want to
  enforce.  The valid options are
 <ul>
 <li>associations - This will prevent users from running jobs if
 their <i>association</i> is not in the database. This option will
 prevent users from accessing invalid accounts.
 </li>
 <li>limits - This will enforce limits set to associations.  By setting
   this option, the 'associations' option is also set.
 </li>
 <li>qos - This will require all jobs to specify (either overtly or by
 default) a valid qos (Quality of Service).  QOS values are defined for
 each association in the database.  By setting this option, the
 'associations' option is also set.
 </li>
 <li>wckeys - This will prevent users from running jobs under a wckey
   that they don't have access to.  By using this option, the
   'associations' option is also set.  The 'TrackWCKey' option is also
   set to true.
 </li>
 </ul>

 <p>(NOTE: The association is a combination of cluster, account,
 user names and optional partition name.)
 <br>
 Without AccountingStorageEnforce being set (the default behavior)
 jobs will be executed based upon policies configured in SLURM on each
 cluster.
 <br>
 It is advisable to run without the option 'limits' set when running a
 scheduler on top of SLURM, like Moab, that does not update in real
 time their limits per association.
 </p>

 <h2>Tools</h2>

 <p>The tool used to manage accounting policy is <i>sacctmgr</i>.
 It can be used to create and delete cluster, user, bank account,
 and partition records plus their combined <i>association</i> record.
 See <i>man sacctmgr</i> for details on this tools and examples of
 its use.</p>

 <p>A web interface with graphical output is currently under development.</p>

 <p>Changes made to the scheduling policy are uploaded to
 the SLURM control daemons on the various clusters and take effect
 immediately. When an association is deleted, all running or pending
 jobs which belong to that association are immediately canceled.
 When limits are lowered, running jobs will not be canceled to
 satisfy the new limits, but the new lower limits will be enforced.</p>

 <h2>Policies supported</h2>

 <p> A limited subset of scheduling policy options are currently
 supported.
 The available options are expected to increase as development
 continues.
 Most of these scheduling policy options are available not only
 for a user association, but also for each cluster and account.
 If a new association is created for some user and a scheduling
 policy option is not specified the default will be: the option
 for the cluster/account pair, and if both are not specified
 then the option for the cluster, and if that also is not
 specified then no limit will apply.</p>

 <p>Currently available scheduling policy options:</p>
 <ul>
 <li><b>Fairshare=</b> Integer value used for determining priority.
   Essentially this is the amount of claim this association and it's
   children have to the above system. Can also be the string "parent",
   this means that the parent association is used for fairshare.
 </li>

 <li><b>GrpCPUMins=</b> A hard limit of cpu minutes to be used by jobs
   running from this association and its children.  If this limit is
   reached all jobs running in this group will be killed, and no new
   jobs will be allowed to run.
 </li>

 <li><b>GrpCPUs=</b> The total count of cpus able to be used at any given
   time from jobs running from this association and its children.  If
   this limit is reached new jobs will be queued but only allowed to
   run after resources have been relinquished from this group.
 </li>

 <li><b>GrpJobs=</b> The total number of jobs able to run at any given
   time from this association and its children.  If
   this limit is reached new jobs will be queued but only allowed to
   run after previous jobs complete from this group.
 </li>

 <li><b>GrpNodes=</b> The total count of nodes able to be used at any given
   time from jobs running from this association and its children.  If
   this limit is reached new jobs will be queued but only allowed to
   run after resources have been relinquished from this group.
 </li>

 <li><b>GrpSubmitJobs=</b> The total number of jobs able to be submitted
   to the system at any given time from this association and its children.  If
   this limit is reached new submission requests will be denied until
   previous jobs complete from this group.
 </li>

 <li><b>GrpWall=</b> The maximum wall clock time any job submitted to
   this group can run for.  If this limit is reached submission requests
   will be denied.
 </li>

 <li><b>MaxCPUsPerJob=</b> The maximum size in cpus any given job can
   have from this association.  If this limit is reached the job will
   be denied at submission.
 </li>

 <li><b>MaxJobs=</b> The total number of jobs able to run at any given
   time from this association.  If this limit is reached new jobs will
   be queued but only allowed to run after previous jobs complete from
   this association.
 </li>

 <li><b>MaxNodesPerJob=</b> The maximum size in nodes any given job can
   have from this association.  If this limit is reached the job will
   be denied at submission.
 </li>

 <li><b>MaxSubmitJobs=</b> The maximum number of jobs able to be submitted
   to the system at any given time from this association.  If
   this limit is reached new submission requests will be denied until
   previous jobs complete from this association.
 </li>

 <li><b>MaxWallDurationPerJob=</b> The maximum wall clock time any job
   submitted to this association can run for.  If this limit is reached
   the job will be denied at submission.
 </li>

 <li><b>QOS=</b> comma separated list of QOS's this association is
   able to run.
 </li>

 <!-- For future use
 <li><b>MaxCPUMinsPerJob=</b> A limit of cpu minutes to be used by jobs
   running from this association.  If this limit is
   reached the job will be killed will be allowed to run.
 </li>
 -->

 </ul>

 <p>The <b>MaxNodes</b> and <b>MaxWall</b> options already exist in
 SLURM's configuration on a per-partition basis, but the above options
 provide the ability to impose limits on a per-user basis.  The
 <b>MaxJobs</b> option provides an entirely new mechanism for SLURM to
 control the workload any individual may place on a cluster in order to
 achieve some balance between users.</p>

 <p>Fair-share scheduling is based upon the hierarchical bank account
 data maintained in the SLURM database.  More information can be found
 in the <a href="priority_multifactor.html">priority/multifactor</a>
 plugin description.</p>

 <p style="text-align: center;">Last modified 10 June 2011</p>

 </ul></body></html>
	<!--#include virtual="header.txt"-->

	<h1>Resource Limits</h1>

	<p>SLURM scheduling policy support was significantly changed
	in version 2.0 in order to take advantage of the database
	integration used for storing accounting information.
	This document describes the capabilities available in
	SLURM version 2.0.
	New features are under active development.
	Familiarity with SLURM's <a href="accounting.html">Accounting</a> web page
	is strongly recommended before use of this document.</p>

	<p>Note for users of Maui or Moab schedulers: <br>
	Maui and Moab are not integrated with SLURM's resource limits,
	but should use their own resource limits mechanisms.</p>

	<h2>Configuration</h2>

	<p>Scheduling policy information must be stored in a database
	as specified by the <b>AccountingStorageType</b> configuration parameter
	in the <b>slurm.conf</b> configuration file.
	Information can be recorded in either <a href="http://www.mysql.com/">MySQL</a>
	or <a href="http://www.postgresql.org/">PostgreSQL</a>.
	For security and performance reasons, the use of
	SlurmDBD (SLURM Database Daemon) as a front-end to the
	database is strongly recommended.
	SlurmDBD uses a SLURM authentication plugin (e.g. MUNGE).
	SlurmDBD also uses an existing SLURM accounting storage plugin
	to maximize code reuse.
	SlurmDBD uses data caching and prioritization of pending requests
	in order to optimize performance.
	While SlurmDBD relies upon existing SLURM plugins for authentication
	and database use, the other SLURM commands and daemons are not required
	on the host where SlurmDBD is installed.
	Only the <i>slurmdbd</i> and <i>slurm-plugins</i> RPMs are required
	for SlurmDBD execution.</p>

	<p>Both accounting and scheduling policy are configured based upon
	an <i>association</i>. An <i>association</i> is a 4-tuple consisting
	of the cluster name, bank account, user and (optionally) the SLURM
	partition.
	In order to enforce scheduling policy, set the value of
	<b>AccountingStorageEnforce</b>:
	This option contains a comma separated list of options you may want to
	enforce. The valid options are
	<ul>
	<li>associations - This will prevent users from running jobs if
	their <i>association</i> is not in the database. This option will
	prevent users from accessing invalid accounts.
	</li>
	<li>limits - This will enforce limits set to associations. By setting
	this option, the 'associations' option is also set.
	</li>
	<li>qos - This will require all jobs to specify (either overtly or by
	default) a valid qos (Quality of Service). QOS values are defined for
	each association in the database. By setting this option, the
	'associations' option is also set.
	</li>
	<li>wckeys - This will prevent users from running jobs under a wckey
	that they don't have access to. By using this option, the
	'associations' option is also set. The 'TrackWCKey' option is also
	set to true.
	</li>
	</ul>

	<p>(NOTE: The association is a combination of cluster, account,
	user names and optional partition name.)
	<br>
	Without AccountingStorageEnforce being set (the default behavior)
	jobs will be executed based upon policies configured in SLURM on each
	cluster.
	<br>
	It is advisable to run without the option 'limits' set when running a
	scheduler on top of SLURM, like Moab, that does not update in real
	time their limits per association.
	</p>

	<h2>Tools</h2>

	<p>The tool used to manage accounting policy is <i>sacctmgr</i>.
	It can be used to create and delete cluster, user, bank account,
	and partition records plus their combined <i>association</i> record.
	See <i>man sacctmgr</i> for details on this tools and examples of
	its use.</p>

	<p>A web interface with graphical output is currently under development.</p>

	<p>Changes made to the scheduling policy are uploaded to
	the SLURM control daemons on the various clusters and take effect
	immediately. When an association is deleted, all running or pending
	jobs which belong to that association are immediately canceled.
	When limits are lowered, running jobs will not be canceled to
	satisfy the new limits, but the new lower limits will be enforced.</p>

	<h2>Policies supported</h2>

	<p> A limited subset of scheduling policy options are currently
	supported.
	The available options are expected to increase as development
	continues.
	Most of these scheduling policy options are available not only
	for a user association, but also for each cluster and account.
	If a new association is created for some user and a scheduling
	policy option is not specified the default will be: the option
	for the cluster/account pair, and if both are not specified
	then the option for the cluster, and if that also is not
	specified then no limit will apply.</p>

	<p>Currently available scheduling policy options:</p>
	<ul>
	<li><b>Fairshare=</b> Integer value used for determining priority.
	Essentially this is the amount of claim this association and it's
	children have to the above system. Can also be the string "parent",
	this means that the parent association is used for fairshare.
	</li>

	<li><b>GrpCPUMins=</b> A hard limit of cpu minutes to be used by jobs
	running from this association and its children. If this limit is
	reached all jobs running in this group will be killed, and no new
	jobs will be allowed to run.
	</li>

	<li><b>GrpCPUs=</b> The total count of cpus able to be used at any given
	time from jobs running from this association and its children. If
	this limit is reached new jobs will be queued but only allowed to
	run after resources have been relinquished from this group.
	</li>

	<li><b>GrpJobs=</b> The total number of jobs able to run at any given
	time from this association and its children. If
	this limit is reached new jobs will be queued but only allowed to
	run after previous jobs complete from this group.
	</li>

	<li><b>GrpNodes=</b> The total count of nodes able to be used at any given
	time from jobs running from this association and its children. If
	this limit is reached new jobs will be queued but only allowed to
	run after resources have been relinquished from this group.
	</li>

	<li><b>GrpSubmitJobs=</b> The total number of jobs able to be submitted
	to the system at any given time from this association and its children. If
	this limit is reached new submission requests will be denied until
	previous jobs complete from this group.
	</li>

	<li><b>GrpWall=</b> The maximum wall clock time any job submitted to
	this group can run for. If this limit is reached submission requests
	will be denied.
	</li>

	<li><b>MaxCPUsPerJob=</b> The maximum size in cpus any given job can
	have from this association. If this limit is reached the job will
	be denied at submission.
	</li>

	<li><b>MaxJobs=</b> The total number of jobs able to run at any given
	time from this association. If this limit is reached new jobs will
	be queued but only allowed to run after previous jobs complete from
	this association.
	</li>

	<li><b>MaxNodesPerJob=</b> The maximum size in nodes any given job can
	have from this association. If this limit is reached the job will
	be denied at submission.
	</li>

	<li><b>MaxSubmitJobs=</b> The maximum number of jobs able to be submitted
	to the system at any given time from this association. If
	this limit is reached new submission requests will be denied until
	previous jobs complete from this association.
	</li>

	<li><b>MaxWallDurationPerJob=</b> The maximum wall clock time any job
	submitted to this association can run for. If this limit is reached
	the job will be denied at submission.
	</li>

	<li><b>QOS=</b> comma separated list of QOS's this association is
	able to run.
	</li>

	<!-- For future use
	<li><b>MaxCPUMinsPerJob=</b> A limit of cpu minutes to be used by jobs
	running from this association. If this limit is
	reached the job will be killed will be allowed to run.
	</li>
	-->

	</ul>

	<p>The <b>MaxNodes</b> and <b>MaxWall</b> options already exist in
	SLURM's configuration on a per-partition basis, but the above options
	provide the ability to impose limits on a per-user basis. The
	<b>MaxJobs</b> option provides an entirely new mechanism for SLURM to
	control the workload any individual may place on a cluster in order to
	achieve some balance between users.</p>

	<p>Fair-share scheduling is based upon the hierarchical bank account
	data maintained in the SLURM database. More information can be found
	in the <a href="priority_multifactor.html">priority/multifactor</a>
	plugin description.</p>

	<p style="text-align: center;">Last modified 10 June 2011</p>

	</ul></body></html>