blob: c19da3147a7dd6d4ba8f1f574081809add78c28f [file] [log] [blame]
<!--#include virtual="header.txt"-->
<h1>Job Array Support</h1>
<h2 id="overview">Overview<a class="slurm_link" href="#overview"></a></h2>
<p>Job arrays offer a mechanism for submitting and managing collections of
similar jobs quickly and easily; job arrays with millions of tasks can be
submitted in milliseconds (subject to configured size limits).
All jobs must have the same initial options (e.g. size, time limit, etc.),
however it is possible to change some of these options after the job has begun
execution using the scontrol command specifying the <i>JobID</i> of the array or individual
<i>ArrayJobID</i>.</p>
<pre>
$ scontrol update job=101 ...
$ scontrol update job=101_1 ...
</pre>
<p>Job arrays are only supported for batch jobs and the array index values are
specified using the <i>--array</i> or <i>-a</i> option of the <i>sbatch</i>
command. The option argument can be specific array index values, a range of
index values, and an optional step size as shown in the examples below.
Note that the minimum index value is zero and the maximum value is a Slurm
configuration parameter (<i>MaxArraySize</i> minus one).
Jobs which are part of a job array will have the environment variable
<i>SLURM_ARRAY_TASK_ID</i> set to its array index value.</p>
<pre>
# Submit a job array with index values between 0 and 31
$ sbatch --array=0-31 -N1 tmp
# Submit a job array with index values of 1, 3, 5 and 7
$ sbatch --array=1,3,5,7 -N1 tmp
# Submit a job array with index values between 1 and 7
# with a step size of 2 (i.e. 1, 3, 5 and 7)
$ sbatch --array=1-7:2 -N1 tmp
</pre>
<p>A maximum number of simultaneously running tasks from the job array may be
specified using a "%" separator.
For example "--array=0-15%4" will limit the number of simultaneously
running tasks from this job array to 4.</p>
<h2 id="env_vars">Job ID and Environment Variables
<a class="slurm_link" href="#env_vars"></a>
</h2>
<p>Job arrays will have additional environment variables set.<br>
<b>SLURM_ARRAY_JOB_ID</b> will be set to the first job ID of the array.<br>
<b>SLURM_ARRAY_TASK_ID</b> will be set to the job array index value.<br>
<b>SLURM_ARRAY_TASK_COUNT</b> will be set to the number of tasks in the job
array.<br>
<b>SLURM_ARRAY_TASK_MAX</b> will be set to the highest job array index
value.<br>
<b>SLURM_ARRAY_TASK_MIN</b> will be set to the lowest job array index value.</p>
<p>Under normal circumstances, array jobs will have the first task of the array
be a place holder for the rest of the array, causing it to be the last to run.
As a result, the task with the lowest <i>SLURM_JOB_ID</i> will have the highest
<i>SLURM_ARRAY_TASK_ID</i>.
For example a job submission of this sort:<br>
<code>sbatch --array=1-3 -N1 tmp</code><br>
will generate a job array containing three jobs. If the sbatch command
responds with:<br>
<code>Submitted batch job 36</code><br>
then the environment variables will be set as follows:</p>
<pre>
SLURM_JOB_ID=36
SLURM_ARRAY_JOB_ID=36
SLURM_ARRAY_TASK_ID=3
SLURM_ARRAY_TASK_COUNT=3
SLURM_ARRAY_TASK_MAX=3
SLURM_ARRAY_TASK_MIN=1
SLURM_JOB_ID=37
SLURM_ARRAY_JOB_ID=36
SLURM_ARRAY_TASK_ID=1
SLURM_ARRAY_TASK_COUNT=3
SLURM_ARRAY_TASK_MAX=3
SLURM_ARRAY_TASK_MIN=1
SLURM_JOB_ID=38
SLURM_ARRAY_JOB_ID=36
SLURM_ARRAY_TASK_ID=2
SLURM_ARRAY_TASK_COUNT=3
SLURM_ARRAY_TASK_MAX=3
SLURM_ARRAY_TASK_MIN=1
</pre>
<p>Ordering of the tasks as shown above is not guaranteed. For example, there
can be cases where individual tasks are created out of order when tasks are
requeued. The task with the lowest JOB_ID may not have the highest TASK_ID if
the tasks are not created sequentially due to the tasks being updated/modified
before they start. Other edge cases may cause similar behavior.</p>
<p>All Slurm commands and APIs recognize the SLURM_JOB_ID value.
Most commands also recognize the SLURM_ARRAY_JOB_ID plus SLURM_ARRAY_TASK_ID
values separated by an underscore as identifying an element of a job array.
Using the example above, "37" or "36_1" would be equivalent ways to identify
the second array element of job 36.
A set of APIs has been developed to operate on an entire job array or select
tasks of a job array in a single function call.
The function response consists of an array identifying the various error codes
for various tasks of a job ID.
For example the <i>job_resume2()</i> function might return an array of error
codes indicating that tasks 1 and 2 have already completed; tasks 3 through 5
are resumed successfully, and tasks 6 through 99 have not yet started.</p>
<h2 id="file_names">File Names<a class="slurm_link" href="#file_names"></a></h2>
<p>Two additional options are available to specify a job's stdin, stdout, and
stderr file names:
<b>%A</b> will be replaced by the value of SLURM_ARRAY_JOB_ID (as defined above)
and
<b>%a</b> will be replaced by the value of SLURM_ARRAY_TASK_ID (as defined above).
The default output file format for a job array is "slurm-%A_%a.out".
An example of explicit use of the formatting is:<br>
<i>sbatch -o slurm-%A_%a.out --array=1-3 -N1 tmp</i><br>
which would generate output files names of this sort
"slurm-36_1.out", "slurm-36_2.out" and "slurm-36_3.out".
If these file name options are used without being part of a job array then
"%A" will be replaced by the current job ID and "%a" will be replaced by
4,294,967,294 (equivalent to 0xfffffffe or NO_VAL).</p>
<h2 id="scancel">Scancel Command Use
<a class="slurm_link" href="#scancel"></a>
</h2>
<p>If the job ID of a job array is specified as input to the scancel command
then all elements of that job array will be cancelled.
Alternately an array ID, optionally using regular expressions, may be specified
for job cancellation.</p>
<pre>
# Cancel array ID 1 to 3 from job array 20
$ scancel 20_[1-3]
# Cancel array ID 4 and 5 from job array 20
$ scancel 20_4 20_5
# Cancel all elements from job array 20
$ scancel 20
# Cancel the current job or job array element (if job array)
if [[-z $SLURM_ARRAY_JOB_ID]]; then
scancel $SLURM_JOB_ID
else
scancel ${SLURM_ARRAY_JOB_ID}_${SLURM_ARRAY_TASK_ID}
fi
</pre>
<h2 id="squeue">Squeue Command Use<a class="slurm_link" href="#squeue"></a></h2>
<p>When a job array is submitted to Slurm, only one job record is created.
Additional job records will only be created when the state of a task in the
job array changes, typically when a task is allocated resources or its state
is modified using the scontrol command.
By default, the squeue command will report all of the tasks associated with
a single job record on one line and use a regular expression to indicate the
"array_task_id" values as shown below.</p>
<pre>
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1080_[5-1024] debug tmp mac PD 0:00 1 (Resources)
1080_1 debug tmp mac R 0:17 1 tux0
1080_2 debug tmp mac R 0:16 1 tux1
1080_3 debug tmp mac R 0:03 1 tux2
1080_4 debug tmp mac R 0:03 1 tux3
</pre>
<p>An option of "--array" or "-r" has also been added to the squeue command
to print one job array element per line as shown below.
The environment variable "SQUEUE_ARRAY" is equivalent to including the "--array"
option on the squeue command line.</p>
<pre>
$ squeue -r
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1082_3 debug tmp mac PD 0:00 1 (Resources)
1082_4 debug tmp mac PD 0:00 1 (Priority)
1080 debug tmp mac R 0:17 1 tux0
1081 debug tmp mac R 0:16 1 tux1
1082_1 debug tmp mac R 0:03 1 tux2
1082_2 debug tmp mac R 0:03 1 tux3
</pre>
<p>The squeue --step/-s and --job/-j options can accept job or step
specifications of the same format.</p>
<pre>
$ squeue -j 1234_2,1234_3
...
$ squeue -s 1234_2.0,1234_3.0
...
</pre>
<p>Two additional job output format field options have been added to squeue:<br>
<b>%F</b> prints the array_job_id value<br>
<b>%K</b> prints the array_task_id value<br>
(all of the obvious letters to use were already assigned to other job fields).</p>
<h2 id="scontrol">Scontrol Command Use
<a class="slurm_link" href="#scontrol"></a>
</h2>
<p>Use of the <i>scontrol show job</i> option shows two new fields related to
job array support.
The <i>JobID</i> is a unique identifier for the job.
The <i>ArrayJobID</i> is the <i>JobID</i> of the first element of the job
array.
The <i>ArrayTaskID</i> is the array index of this particular entry, either a
single number of an expression identifying the entries represented by this
job record (e.g. "5-1024").
Neither field is displayed if the job is not part of a job array.
The optional job ID specified with the <i>scontrol show job</i> or
<i>scontrol show step</i> commands can identify job array elements by
specifying <i>ArrayJobId</i> and <i>ArrayTaskId</i> with an underscore between
them (e.g. &lt;ArrayJobID&gt;_&lt;ArrayTaskId&gt;).
<p>The scontrol command will operate on all elements of a job array if the
job ID specified is <i>ArrayJobID</i>.
Individual job array tasks can be modified using the
<i>ArrayJobID</i>_<i>ArrayTaskID</i> as shown below.</p>
<pre>
$ sbatch --array=1-4 -J array ./sleepme 86400
Submitted batch job 21845
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST
21845_1 canopo array david R 0:13 1 dario
21845_2 canopo array david R 0:13 1 dario
21845_3 canopo array david R 0:13 1 dario
21845_4 canopo array david R 0:13 1 dario
$ scontrol update JobID=21845_2 name=arturo
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST
21845_1 canopo array david R 17:03 1 dario
21845_2 canopo arturo david R 17:03 1 dario
21845_3 canopo array david R 17:03 1 dario
21845_4 canopo array david R 17:03 1 dario
</pre>
<p>
The scontrol hold, holdu, release, requeue, requeuehold, suspend and resume
commands can also either operate on all elements of a job array or individual
elements as shown below.
</p>
<pre>
$ scontrol suspend 21845
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST
21845_1 canopo array david S 25:12 1 dario
21845_2 canopo arturo david S 25:12 1 dario
21845_3 canopo array david S 25:12 1 dario
21845_4 canopo array david S 25:12 1 dario
$ scontrol resume 21845
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST
21845_1 canopo array david R 25:14 1 dario
21845_2 canopo arturo david R 25:14 1 dario
21845_3 canopo array david R 25:14 1 dario
21845_4 canopo array david R 25:14 1 dario
scontrol suspend 21845_3
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST
21845_1 canopo array david R 25:14 1 dario
21845_2 canopo arturo david R 25:14 1 dario
21845_3 canopo array david S 25:14 1 dario
21845_4 canopo array david R 25:14 1 dario
scontrol resume 21845_3
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST
21845_1 canopo array david R 25:14 1 dario
21845_2 canopo arturo david R 25:14 1 dario
21845_3 canopo array david R 25:14 1 dario
21845_4 canopo array david R 25:14 1 dario
</pre>
<h2 id="dependencies">Job Dependencies
<a class="slurm_link" href="#dependencies"></a>
</h2>
<p>A job which is to be dependent upon an entire job array should specify
itself dependent upon the ArrayJobID.
Since each array element can have a different exit code, the interpretation of
the <i>afterok</i> and <i>afternotok</i> clauses will be based upon the highest
exit code from any task in the job array.</p>
<p>When a job dependency specifies the job ID of a job array:<br>
The <i>after</i> clause is satisfied after all tasks in the job array start.<br>
The <i>afterany</i> clause is satisfied after all tasks in the job array
complete.<br>
The <i>aftercorr</i> clause is satisfied after the corresponding task ID in the
specified job has completed successfully (ran to completion with an exit code of
zero).<br>
The <i>afterok</i> clause is satisfied after all tasks in the job array
complete successfully.<br>
The <i>afternotok</i> clause is satisfied after all tasks in the job array
complete with at least one tasks not completing successfully.</p>
<p>Examples of use are shown below:</p>
<pre>
# Wait for specific job array elements
sbatch --depend=after:123_4 my.job
sbatch --depend=afterok:123_4:123_8 my.job2
# Wait for entire job array to complete
sbatch --depend=afterany:123 my.job
# Wait for corresponding job array elements
sbatch --depend=aftercorr:123 my.job
# Wait for entire job array to complete successfully
sbatch --depend=afterok:123 my.job
# Wait for entire job array to complete and at least one task fails
sbatch --depend=afternotok:123 my.job
</pre>
<h2 id="other_commands">Other Command Use
<a class="slurm_link" href="#other_commands"></a>
</h2>
<p>The following Slurm commands do not currently recognize job arrays and their
use requires the use of Slurm job IDs, which are unique for each array element:
sbcast, sprio, sreport, sshare and sstat.
The sacct, sattach and strigger commands have been modified to permit
specification of either job IDs or job array elements.
The sview command has been modified to permit display of a job's ArrayJobId
and ArrayTaskId fields. Both fields are displayed with a value of "N/A" if
the job is not part of a job array.</p>
<h2 id="administration">System Administration
<a class="slurm_link" href="#administration"></a>
</h2>
<p>A new configuration parameter has been added to control the maximum
job array size: <b>MaxArraySize</b>. The smallest index that can be specified
by a user is zero and the maximum index is MaxArraySize minus one.
The default value of MaxArraySize is 1001.
The maximum MaxArraySize supported in Slurm is 4000001.
Be mindful about the value of MaxArraySize as job arrays offer an easy way
for users to submit large numbers of jobs very quickly.</p>
<p>The sched/backfill plugin has been modified to improve performance with
job arrays. Once one element of a job array is discovered to not be runnable
or impact the scheduling of pending jobs, the remaining elements of that job
array will be quickly skipped.</p>
<p>Slurm creates a single job record when a job array is submitted.
Additional job records are only created as needed, typically when a task
of a job array is started, which provides a very scalable mechanism to
manage large job counts.
Each task of the job array will share the same ArrayJobId but will have their
own unique ArrayTaskId. In addition to the ArrayJobId, each job will have a
unique JobId that gets assigned as the tasks are started.</p>
<p style="text-align:center;">Last modified 01 January 2024</p>
<!--#include virtual="footer.txt"-->