| <!--#include virtual="header.txt"--> |
| |
| <h1>Resource Limits</h1> |
| |
| <p>SLURM scheduling policy support was significantly changed |
| in version 2.0 in order to take advantage of the database |
| integration used for storing accounting information. |
| This document describes the capabilities available in |
| SLURM version 2.0. |
| New features are under active development. |
| Familiarity with SLURM's <a href="accounting.html">Accounting</a> web page |
| is strongly recommended before use of this document.</p> |
| |
| <p>Note for users of Maui or Moab schedulers: <br> |
| Maui and Moab are not integrated with SLURM's resource limits, |
| but should use their own resource limits mechanisms.</p> |
| |
| <h2>Configuration</h2> |
| |
| <p>Scheduling policy information must be stored in a database |
| as specified by the <b>AccountingStorageType</b> configuration parameter |
| in the <b>slurm.conf</b> configuration file. |
| Information can be recorded in either <a href="http://www.mysql.com/">MySQL</a> |
| or <a href="http://www.postgresql.org/">PostgreSQL</a>. |
| For security and performance reasons, the use of |
| SlurmDBD (SLURM Database Daemon) as a front-end to the |
| database is strongly recommended. |
| SlurmDBD uses a SLURM authentication plugin (e.g. MUNGE). |
| SlurmDBD also uses an existing SLURM accounting storage plugin |
| to maximize code reuse. |
| SlurmDBD uses data caching and prioritization of pending requests |
| in order to optimize performance. |
| While SlurmDBD relies upon existing SLURM plugins for authentication |
| and database use, the other SLURM commands and daemons are not required |
| on the host where SlurmDBD is installed. |
| Only the <i>slurmdbd</i> and <i>slurm-plugins</i> RPMs are required |
| for SlurmDBD execution.</p> |
| |
| <p>Both accounting and scheduling policy are configured based upon |
| an <i>association</i>. An <i>association</i> is a 4-tuple consisting |
| of the cluster name, bank account, user and (optionally) the SLURM |
| partition. |
| In order to enforce scheduling policy, set the value of |
| <b>AccountingStorageEnforce</b>: |
| This option contains a comma separated list of options you may want to |
| enforce. The valid options are |
| <ul> |
| <li>associations - This will prevent users from running jobs if |
| their <i>association</i> is not in the database. This option will |
| prevent users from accessing invalid accounts. |
| </li> |
| <li>limits - This will enforce limits set to associations. By setting |
| this option, the 'associations' option is also set. |
| </li> |
| <li>qos - This will require all jobs to specify (either overtly or by |
| default) a valid qos (Quality of Service). QOS values are defined for |
| each association in the database. By setting this option, the |
| 'associations' option is also set. |
| </li> |
| <li>wckeys - This will prevent users from running jobs under a wckey |
| that they don't have access to. By using this option, the |
| 'associations' option is also set. The 'TrackWCKey' option is also |
| set to true. |
| </li> |
| </ul> |
| |
| <p>(NOTE: The association is a combination of cluster, account, |
| user names and optional partition name.) |
| <br> |
| Without AccountingStorageEnforce being set (the default behavior) |
| jobs will be executed based upon policies configured in SLURM on each |
| cluster. |
| <br> |
| It is advisable to run without the option 'limits' set when running a |
| scheduler on top of SLURM, like Moab, that does not update in real |
| time their limits per association. |
| </p> |
| |
| <h2>Tools</h2> |
| |
| <p>The tool used to manage accounting policy is <i>sacctmgr</i>. |
| It can be used to create and delete cluster, user, bank account, |
| and partition records plus their combined <i>association</i> record. |
| See <i>man sacctmgr</i> for details on this tools and examples of |
| its use.</p> |
| |
| <p>A web interface with graphical output is currently under development.</p> |
| |
| <p>Changes made to the scheduling policy are uploaded to |
| the SLURM control daemons on the various clusters and take effect |
| immediately. When an association is deleted, all running or pending |
| jobs which belong to that association are immediately canceled. |
| When limits are lowered, running jobs will not be canceled to |
| satisfy the new limits, but the new lower limits will be enforced.</p> |
| |
| <h2>Policies supported</h2> |
| |
| <p> A limited subset of scheduling policy options are currently |
| supported. |
| The available options are expected to increase as development |
| continues. |
| Most of these scheduling policy options are available not only |
| for a user association, but also for each cluster and account. |
| If a new association is created for some user and a scheduling |
| policy option is not specified the default will be: the option |
| for the cluster/account pair, and if both are not specified |
| then the option for the cluster, and if that also is not |
| specified then no limit will apply.</p> |
| |
| <p>Currently available scheduling policy options:</p> |
| <ul> |
| <li><b>Fairshare=</b> Integer value used for determining priority. |
| Essentially this is the amount of claim this association and it's |
| children have to the above system. Can also be the string "parent", |
| this means that the parent association is used for fairshare. |
| </li> |
| |
| <li><b>GrpCPUMins=</b> A hard limit of cpu minutes to be used by jobs |
| running from this association and its children. If this limit is |
| reached all jobs running in this group will be killed, and no new |
| jobs will be allowed to run. |
| </li> |
| |
| <li><b>GrpCPUs=</b> The total count of cpus able to be used at any given |
| time from jobs running from this association and its children. If |
| this limit is reached new jobs will be queued but only allowed to |
| run after resources have been relinquished from this group. |
| </li> |
| |
| <li><b>GrpJobs=</b> The total number of jobs able to run at any given |
| time from this association and its children. If |
| this limit is reached new jobs will be queued but only allowed to |
| run after previous jobs complete from this group. |
| </li> |
| |
| <li><b>GrpNodes=</b> The total count of nodes able to be used at any given |
| time from jobs running from this association and its children. If |
| this limit is reached new jobs will be queued but only allowed to |
| run after resources have been relinquished from this group. |
| </li> |
| |
| <li><b>GrpSubmitJobs=</b> The total number of jobs able to be submitted |
| to the system at any given time from this association and its children. If |
| this limit is reached new submission requests will be denied until |
| previous jobs complete from this group. |
| </li> |
| |
| <li><b>GrpWall=</b> The maximum wall clock time any job submitted to |
| this group can run for. If this limit is reached submission requests |
| will be denied. |
| </li> |
| |
| <li><b>MaxCPUsPerJob=</b> The maximum size in cpus any given job can |
| have from this association. If this limit is reached the job will |
| be denied at submission. |
| </li> |
| |
| <li><b>MaxJobs=</b> The total number of jobs able to run at any given |
| time from this association. If this limit is reached new jobs will |
| be queued but only allowed to run after previous jobs complete from |
| this association. |
| </li> |
| |
| <li><b>MaxNodesPerJob=</b> The maximum size in nodes any given job can |
| have from this association. If this limit is reached the job will |
| be denied at submission. |
| </li> |
| |
| <li><b>MaxSubmitJobs=</b> The maximum number of jobs able to be submitted |
| to the system at any given time from this association. If |
| this limit is reached new submission requests will be denied until |
| previous jobs complete from this association. |
| </li> |
| |
| <li><b>MaxWallDurationPerJob=</b> The maximum wall clock time any job |
| submitted to this association can run for. If this limit is reached |
| the job will be denied at submission. |
| </li> |
| |
| <li><b>QOS=</b> comma separated list of QOS's this association is |
| able to run. |
| </li> |
| |
| <!-- For future use |
| <li><b>MaxCPUMinsPerJob=</b> A limit of cpu minutes to be used by jobs |
| running from this association. If this limit is |
| reached the job will be killed will be allowed to run. |
| </li> |
| --> |
| |
| </ul> |
| |
| <p>The <b>MaxNodes</b> and <b>MaxWall</b> options already exist in |
| SLURM's configuration on a per-partition basis, but the above options |
| provide the ability to impose limits on a per-user basis. The |
| <b>MaxJobs</b> option provides an entirely new mechanism for SLURM to |
| control the workload any individual may place on a cluster in order to |
| achieve some balance between users.</p> |
| |
| <p>Fair-share scheduling is based upon the hierarchical bank account |
| data maintained in the SLURM database. More information can be found |
| in the <a href="priority_multifactor.html">priority/multifactor</a> |
| plugin description.</p> |
| |
| <p style="text-align: center;">Last modified 10 June 2011</p> |
| |
| </ul></body></html> |