| <!--#include virtual="header.txt"--> |
| |
| <h1>Extra Constraints</h1> |
| |
| <h2>Contents</h2> |
| <ul> |
| <li><a href="#Overview">Overview</a></li> |
| <li><a href="#Configuration">Configuration</a></li> |
| <li><a href="#Node_Extra_Data">Node Extra Data</a></li> |
| <li><a href="#Job_Submission">Job Submission</a> |
| <ul> |
| <li><a href="#Syntax">Syntax</a></li> |
| <li><a href="#Warnings">Warnings</a></li> |
| <li><a href="#Valid">Valid and Invalid Requests</a></li> |
| </ul> |
| <li><a href="#Examples">Examples</a></li> |
| </li> |
| </ul> |
| |
| <h2 id="Overview">Overview |
| <a class="slurm_link" href="#Overview"></a> |
| </h2> |
| <p> |
| Extra data may be added to a node, and jobs may request extra constraints |
| to filter nodes based on their extra data. This is disabled by |
| default, but may be enabled in slurm.conf. <b>Warning</b>: Slurm's backfill |
| scheduler cannot accurately plan nodes for jobs whose request extra constraints |
| are not immediately satisfied. This means that the more often extra data for |
| nodes is changed, the less accurate the backfill scheduler will be. |
| </p> |
| |
| <h2 id="Configuration">Configuration |
| <a class="slurm_link" href="#Configuration"></a> |
| </h2> |
| |
| <ul> |
| <li>In slurm.conf, configure |
| <code>SchedulerParameters=extra_constraints</code></li> |
| </ul> |
| |
| <h2 id="Node_Extra_Data">Node Extra Data |
| <a class="slurm_link" href="#Node_Extra_Data"></a> |
| </h2> |
| <p> |
| A node's extra data is a json formatted string. It may be initialized on |
| slurmd startup with the --extra flag for slurmd. For example: |
| </p> |
| <pre> |
| slurmd --extra '{ "a": 1.23, "b": true, "c": 0, "foo": "bar", "zed": 23 }' |
| </pre> |
| <p> |
| Or, it may be updated with scontrol. For example: |
| </p> |
| <pre> |
| scontrol update nodename=node123 extra='{ "a": 1.23, "b": true, "c": 0, "foo": "bar", "zed": 23 }' |
| </pre> |
| <p> |
| This defines the features that may be requested by the --extra option in |
| salloc, sbatch, and srun. Values may be any string, number, or boolean value. |
| </p> |
| |
| <h2 id="Job_Submission">Job Submission |
| <a class="slurm_link" href="#Job_Submission"></a> |
| </h2> |
| |
| <h3 id="Syntax">Syntax |
| <a class="slurm_link" href="#Syntax"></a> |
| </h3> |
| <p> |
| The salloc, sbatch, or srun --extra field is an arbitrary string enclosed in |
| single or double quotes if using spaces or some special characters. |
| </p> |
| |
| <p> |
| If <b>SchedulerParameters=extra_constraints</b> is enabled, this string is used |
| for node filtering based on the <i>Extra</i> field in each node. |
| </p> |
| |
| <p> |
| The most basic request is structured like this: |
| </p> |
| |
| <pre> |
| <key><comparison_operator><value> |
| </pre> |
| |
| <p> |
| Key and value are arbitrary, non-empty strings that cannot contain any |
| characters that are part of operators and cannot contain parentheses. Thus, |
| the following characters are not allowed in a key or value: |
| </p> |
| |
| <pre> |
| ,&|<>=!() |
| </pre> |
| |
| <p> |
| The following comparison operators are allowed: |
| </p> |
| <ul> |
| <li><code>= (equal to)</code></li> |
| <li><code>!= (not equal to)</code></li> |
| <li><code>> (greater than)</code></li> |
| <li><code>>= (greater than or equal to)</code></li> |
| <li><code>< (less than)</code></li> |
| <li><code><= (less than or equal to)</code></li> |
| </ul> |
| |
| <p> |
| Two numbers are equal if their difference is less than 0.00001. |
| Numerical suffixes (such as kb or mb) are not supported. If letters are |
| interspersed with numbers, then the key or value is considered a string. |
| </p> |
| |
| <p> |
| Requests can be joined together with boolean operators. |
| </p> |
| |
| <pre> |
| <request><boolean_operator><request> |
| </pre> |
| |
| <p> |
| The following boolean operators are allowed: |
| </p> |
| |
| <pre> |
| & (AND) |
| , (AND) |
| | (OR) |
| </pre> |
| |
| <p> |
| Any number of parentheses may be used to group requests together. |
| All boolean operators at any given level of parentheses must be identical. |
| Boolean operators at different levels of parentheses may be different. |
| For example, this is not allowed: |
| </p> |
| |
| <pre> |
| a=1&b=2|c=foobar |
| </pre> |
| |
| <p> |
| But this is allowed: |
| </p> |
| |
| <pre> |
| (a=1&b=2)|c=foobar |
| </pre> |
| |
| <h3 id="Warnings">Warnings |
| <a class="slurm_link" href="#Warnings"></a> |
| </h3> |
| |
| <p> |
| Whitespace characters are not treated specially. Any whitespace characters will |
| be considered part of a key or value. This means that the following is invalid: |
| </p> |
| |
| <pre> |
| --extra " (a=b)" |
| </pre> |
| |
| <p> |
| The space at he beginning is parsed as a key of a request. Then the opening |
| parenthesis character is recognized as an invalid character for either a key |
| or a comparison operator. This request would result in the job being rejected. |
| However, this is valid: |
| </p> |
| |
| <pre> |
| --extra "( a=b)" |
| </pre> |
| |
| <p> |
| This has a single request. The key is " a", the comparison operator is "=", and |
| the value is "b". |
| </p> |
| |
| <p> |
| This same warning applies to single and double quotes. These are not considered |
| special characters, and thus are part of a string. Thus, bar and "bar" are not |
| equal. |
| </p> |
| |
| <h3 id="Valid">Valid and Invalid Requests |
| <a class="slurm_link" href="#Valid"></a> |
| </h3> |
| |
| <p> |
| Here are some examples of <b>valid</b> requests: |
| </p> |
| |
| <pre> |
| a=1.23 |
| a= b |
| a!=1.24 |
| a!=1.23|foo!=blah |
| b=200 |
| b=true |
| foo<baz |
| (c<=0.0001&a=1.25)|zed=23.0 |
| ((c<=0.0001&a=1.25)|zed=23.0)&(a<1|b=false|c>=0.00000001) |
| ((c<=0.0001&a=1.25)|zed=23.0)&(a<1|b=true|c>=0.1) |
| </pre> |
| |
| <p> |
| Here are some examples of <b>invalid</b> requests: |
| </p> |
| |
| <p> |
| Invalid comparison operator: |
| </p> |
| <pre> |
| a,<=6 |
| </pre> |
| |
| <p> |
| Trailing operator: |
| </p> |
| <pre> |
| a<=6<= |
| </pre> |
| |
| <p> |
| Multiple boolean operators in a row: |
| </p> |
| <pre> |
| a=5&&&b=5 |
| a=5|||b=5 |
| </pre> |
| |
| <p> |
| Multiple comparison operators in a row: |
| </p> |
| <pre> |
| a====5 |
| b<=<=5 |
| </pre> |
| |
| <p> |
| Parentheses without anything inside: |
| </p> |
| <pre> |
| a=5&() |
| </pre> |
| |
| <p> |
| Different boolean operators at a single level of parentheses: |
| </p> |
| <pre> |
| a=5&b=5|c=5 |
| (a=1)&(b=2)|(c=3) |
| </pre> |
| |
| <p> |
| No boolean operator between individual requests: |
| </p> |
| <pre> |
| a=1(b=2) |
| (a=1)(b=2) |
| (((a=1)b=2)) |
| </pre> |
| |
| <h2 id="Examples">Examples |
| <a class="slurm_link" href="#Examples"></a> |
| </h2> |
| <p> |
| Given a node with the following extra data: |
| </p> |
| |
| <pre> |
| Extra={ "a": 1.23, "b": true, "c": 0, "foo": "bar", "zed": 23 } |
| </pre> |
| |
| <p> |
| The following --extra requests are fulfilled by this node: |
| </p> |
| |
| <pre> |
| a=1.23 |
| a!=1.24 |
| a!=1.23|foo!=blah |
| b=200 |
| b=true |
| foo<baz |
| (c<=0.0001&a=1.25)|zed=23.0 |
| ((c<=0.0001&a=1.25)|zed=23.0)&(a<1|b=false|c>=0.00000001) |
| ((c<=0.0001&a=1.25)|zed=23.0)&(a<1|b=true|c>=0.1) |
| </pre> |
| |
| <p> |
| The following --extra requests are not fulfilled by this node: |
| </p> |
| |
| <pre> |
| a!=1.23 |
| b=0 |
| b=false |
| foo>baz |
| ((c<=0.0001&a=1.25)|zed=23.0)&(a<1|b=false|c>=0.00001) |
| </pre> |
| |
| <p> |
| Reminder: in order for two numbers to be considered equal, their difference |
| must be less than 0.0001. This is why 0.0001 is not considered equal to 0 and |
| thus the request <code>c>=0.0001</code> is not fulfilled, |
| but 0.00000001 is considered equal to 0 and thus the request |
| <code>c>=0.00000001</code> is fulfilled. |
| </p> |
| |
| <p> |
| A practical example might be to have a script that looks at the load average |
| of each node and updates the extra attribute for each node with the current |
| value. This would allow users to restrict their jobs to nodes whose load |
| average is below a certain threshold. |
| </p> |
| |
| <p> |
| In this simple example, the three nodes in a cluster are being monitored and |
| the extra attribute is being populated with their load average. |
| <pre> |
| $ scontrol show nodes node[01-03] | grep -E 'NodeName|Extra' |
| NodeName=node01 Arch=x86_64 CoresPerSocket=6 |
| Extra={ "load": 0.99 } |
| NodeName=node02 Arch=x86_64 CoresPerSocket=6 |
| Extra={ "load": 0.75 } |
| NodeName=node03 Arch=x86_64 CoresPerSocket=6 |
| Extra={ "load": 0.45 } |
| </pre> |
| </p> |
| |
| <p> |
| A job can request to run on a machine with less than half of the CPU time |
| being utilized. |
| <pre> |
| $ sbatch -n12 --extra "load<0.5" --wrap='srun sleep 10' |
| Submitted batch job 11206 |
| |
| $ squeue |
| JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) |
| 11206 debug wrap ben R 0:03 1 node03 |
| </pre> |
| </p> |
| |
| <p> |
| A job can also request to run on a node between a range of acceptable load values. |
| <pre> |
| $ sbatch -n12 --extra "(load<0.9&load>0.5)" --wrap='srun sleep 10' |
| Submitted batch job 11207 |
| |
| $ squeue |
| JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) |
| 11207 debug wrap ben R 0:01 1 node02 |
| </pre> |
| </p> |
| |
| <p style="text-align: center;">Last modified 08 November 2024</p> |
| |
| <!--#include virtual="footer.txt"--> |