blob: 0123ae48d073a0b0984fbfe8dfdfa442982ae8d3 [file] [log] [blame] [edit]
<!--#include virtual="header.txt"-->
<h1>Resource Reservation Guide</h1>
<p>SLURM version 2.0 has the ability to reserve resources for jobs
being executed by select users and/or select bank accounts.
A resource reservation identifies the nodes of a resource reservation
and a time period during which the reservation is available.
Note that resource reservations are not compatible with SLURM's
gang scheduler plugin since the termination time of running jobs
is not possible to accurately predict.</p>
<p>Reservations can be created, updated, or destroyed only by user root
or the configured <i>SlurmUser</i> using the <i>scontrol</i> command.
The <i>scontrol</i>, <i>smap</i> and <i>sview</i> commands can be used
to view reservations.
The man pages for the various commands contain details.</p>
<p>Note for users of Maui or Moab schedulers: <br>
Maui and Moab are not integrated with SLURM's resource reservation system,
but should use their own advanced reservation system.</p>
<h2>Reservation Creation</h2>
<p>One common mode of operation for a reservation would be to reserve
an entire computer at a particular time for a system down time.
The example below shows the creation of a full-system reservation
at 16:00 hours on 6 February and lasting for 120 minutes.
The "maint" flag is used to identify the reservation for accounting
purposes as system maintenance.
The "ignore_jobs" flag is used to indicate that we can ignore currently
running jobs when creating this reservation.
By default, only nodes which are not expected to have a running job
at the start time can be reserved (the time limit of all running
jobs will have been reached).
In this case we can manually cancel the running jobs as needed
to perform system maintenance.
As the reservation time approaches,
only jobs that can complete by the reservation time will be
initiated.</p>
<pre>
$ scontrol create reservation starttime=2009-02-06T16:00:00 \
duration=120 user=root flags=maint,ignore_jobs nodes=ALL
Reservation created: root_3
$ scontrol show reservation
ReservationName=root_3 StartTime=2009-02-06T16:00:00
EndTime=2009-02-06T18:00:00 Duration=120
Nodes=ALL NodeCnt=20
Features=(null) PartitionName=(null)
Flags=MAINT,SPEC_NODES,IGNORE_JOBS Licenses=(null)
Users=root Accounts=(null)
</pre>
<p>A variation of this would be to configure license to
represent system resources, such as a global file system.
One could create a reservation for all of those licenses
in order to perform maintenance on that resource.</p>
<pre>
$ scontrol create reservation starttime=2009-04-06T16:00:00 \
duration=120 user=root flags=maint,ignore_jobs \
licenses=lustre*1000
Reservation created: root_4
$ scontrol show reservation
ReservationName=root_4 StartTime=2009-04-06T16:00:00
EndTime=2009-04-06T18:00:00 Duration=120
Nodes= NodeCnt=0
Features=(null) PartitionName=(null)
Flags=MAINT,SPEC_NODES,IGNORE_JOBS Licenses=lustre*1000
Users=root Accounts=(null)
</pre>
<p>Another mode of operation would be to reserve specific nodes
for an indefinite period in order to study problems on those
nodes. This could also be accomplished using a SLURM partition
specifically for this purpose, but that would fail to capture
the maintenance nature of their use.</p>
<pre>
$ scontrol create reservation user=root starttime=now \
duration=infinite flags=maint nodes=sun000
Reservation created: root_5
$ scontrol show res
ReservationName=root_5 StartTime=2009-02-04T16:22:57
EndTime=2009-02-04T16:21:57 Duration=4294967295
Nodes=sun000 NodeCnt=1
Features=(null) PartitionName=(null)
Flags=MAINT,SPEC_NODES Licenses=(null)
Users=root Accounts=(null)
</pre>
<p>Our final example is to reserve ten nodes in the default
SLURM partition starting at noon and with a duration of 60
minutes occurring daily. The reservation will be available
only to users alan and brenda.</p>
<pre>
$ scontrol create reservation user=alan,brenda \
starttime=noon duration=60 flags=daily nodecnt=10
Reservation created: alan_6
$ scontrol show res
ReservationName=alan_6 StartTime=2009-02-05T12:00:00
EndTime=2009-02-05T13:00:00 Duration=60
Nodes=sun[000-003,007,010-013,017] NodeCnt=10
Features=(null) PartitionName=pdebug
Flags=DAILY Licenses=(null)
Users=alan,brenda Accounts=(null)
</pre>
<p>Note that specific nodes to be associated with the reservation are
made immediately after creation of the reservation. This permits
users to stage files to the nodes in preparation for use during the
reservation. Note that the reservation creation request can also
identify the partition from which to select the nodes or _one_
feature that every selected node must contain.</p>
<h2>Reservation Use</h2>
<p>The reservation create response includes the reservation's name.
This name is automatically generated by SLURM based upon the first
user or account name and a numeric suffix. In order to use the
reservation, the job submit request must explicitly specify that
reservation name. The job must be contained completely within the
named reservation. The job will be canceled after the reservation
reaches its EndTime. If letting the job continue execution after
the reservation EndTime, a configuration option <i>ResvOverRun</i>
can be set to control how long the job can continue execution.</p>
<pre>
$ sbatch --reservation=alan_6 -N4 my.script
sbatch: Submitted batch job 65540
</pre>
<h2>Reservation Modification</h2>
<p>Reservations can be modified by user root as desired.
For example their duration could be altered or the users
granted access changed as shown below:</p>
<pre>
$ scontrol update ReservationName=root_3 \
duration=150 users=admin
Reservation updated.
bash-3.00$ scontrol show ReservationName=root_3
ReservationName=root_3 StartTime=2009-02-06T16:00:00
EndTime=2009-02-06T18:30:00 Duration=150
Nodes=ALL NodeCnt=20 Features=(null)
PartitionName=(null) Flags=MAINT,SPEC_NODES Licenses=(null)
Users=admin Accounts=(null)
</pre>
<h2>Reservation Deletion</h2>
<p>Reservations are automatically purged after their end time.
They may also be manually deleted as shown below.
Note that a reservation can not be deleted while there are
jobs running in it.</p>
<pre>
$ scontrol delete ReservationName=alan_6
</pre>
<h2>Overlapping Reservations</h2>
<p>By default, reservations must not overlap. They must either include
different nodes or operate at different times. If specific nodes
are not specified when a reservation is created, SLURM will
automatically select nodes to avoid overlap and insure that
the selected nodes are available when the reservation begins.</p>
<p>There is very limited support for overlapping reservations
with two specific modes of operation available.
For ease of system maintenance, you can create a reservation
with the "maint" flag that overlaps existing reservations.
This permits an administrator to easily create a maintenance
reservation for an entire cluster without needing to remove
or reschedule pre-existing reservations. Users requesting access
to one of these pre-existing reservations will be prevented from
using resources that are also in this maintenance reservation.
For example, users alan and brenda might have a reservation for
some nodes daily from noon until 1PM. If there is a maintenance
reservation for all nodes starting at 12:30PM, the only jobs they
may start in their reservation would have to be completed by 12:30PM,
when the maintenance reservation begins.</p>
<p>The second exception operates in the same manner as a maintenance
reservation except that is it not logged in the accounting system as nodes
reserved for maintenance.
It requires the use of the "overlap" flag when creating the second
reservation.
This might be used to insure availability of resources for a specific
user within a group having a reservation.
Using the previous example of alan and brenda having a 10 node reservation
for 60 minutes, we might want to reserve 4 nodes of that for for brenda
during the first 30 minutes of the time period.
In this case, the creation of one overlapping reservation (for a total of
two reservations) may be simpler than creating three separate reservations,
partly since the use of any reservation requires the job specification
of the reservation name.
<ol>
<li>A six node reservation for both alan and brenda that lasts the full
60 minutes</li>
<li>A four node reservation for brenda for the first 30 minutes</li>
<li>A four node reservation for both alan and brenda that lasts for the
final 30 minutes</li>
</ol></p>
<p>If the "maint" or "overlap" flag is used when creating reservations,
one could create a reservation within a reservation within a third
reservation.
Note a reservation having a "maint" or "overlap" flag will not have
resources removed from it by a subsequent reservation also having a
"maint" or "overlap" flag, so nesting of reservations only works to a
depth of two.</p>
<h2>Reservation Accounting</h2>
<p>Jobs executed within a reservation are accounted for using the appropriate
user and bank account. If resources within a reservation are not used, those
resources will be accounted for as being used by all users or bank accounts
associated with the reservation on an equal basis (e.g. if two users are
eligible to use a reservation and neither does, each user will be reported
to have used half of the reserved resources).</p>
<h2>Future Work</h2>
<p>Several enhancements are anticipated at some point in the future.
<ol>
<li>The automatic selection of nodes for a reservation create request may be
sub-optimal in terms of locality (for optimized application
communication).</li>
<li>Reservations made within a partition having gang scheduling assumes
the highest level rather than the actual level of time-slicing when
considering the initiation of jobs.
This will prevent the initiation of some jobs which would complete execution
before a reservation given fewer jobs to time-slice with.</li>
</ol>
<p style="text-align: center;">Last modified 25 August 2009</p>
<!--#include virtual="footer.txt"-->