| <!--#include virtual="header.txt"--> |
| |
| <h1>Resource Reservation Guide</h1> |
| |
| <p>SLURM version 2.0 has the ability to reserve resources for jobs |
| being executed by select users and/or select bank accounts. |
| A resource reservation identifies the nodes of a resource reservation |
| and a time period during which the reservation is available. |
| Note that resource reservations are not compatible with SLURM's |
| gang scheduler plugin since the termination time of running jobs |
| is not possible to accurately predict.</p> |
| |
| <p>Reservations can be created, updated, or destroyed only by user root |
| or the configured <i>SlurmUser</i> using the <i>scontrol</i> command. |
| The <i>scontrol</i>, <i>smap</i> and <i>sview</i> commands can be used |
| to view reservations. |
| The man pages for the various commands contain details.</p> |
| |
| <p>Note for users of Maui or Moab schedulers: <br> |
| Maui and Moab are not integrated with SLURM's resource reservation system, |
| but should use their own advanced reservation system.</p> |
| |
| <h2>Reservation Creation</h2> |
| |
| <p>One common mode of operation for a reservation would be to reserve |
| an entire computer at a particular time for a system down time. |
| The example below shows the creation of a full-system reservation |
| at 16:00 hours on 6 February and lasting for 120 minutes. |
| The "maint" flag is used to identify the reservation for accounting |
| purposes as system maintenance. |
| The "ignore_jobs" flag is used to indicate that we can ignore currently |
| running jobs when creating this reservation. |
| By default, only nodes which are not expected to have a running job |
| at the start time can be reserved (the time limit of all running |
| jobs will have been reached). |
| In this case we can manually cancel the running jobs as needed |
| to perform system maintenance. |
| As the reservation time approaches, |
| only jobs that can complete by the reservation time will be |
| initiated.</p> |
| <pre> |
| $ scontrol create reservation starttime=2009-02-06T16:00:00 \ |
| duration=120 user=root flags=maint,ignore_jobs nodes=ALL |
| Reservation created: root_3 |
| |
| $ scontrol show reservation |
| ReservationName=root_3 StartTime=2009-02-06T16:00:00 |
| EndTime=2009-02-06T18:00:00 Duration=120 |
| Nodes=ALL NodeCnt=20 |
| Features=(null) PartitionName=(null) |
| Flags=MAINT,SPEC_NODES,IGNORE_JOBS Licenses=(null) |
| Users=root Accounts=(null) |
| </pre> |
| |
| <p>A variation of this would be to configure license to |
| represent system resources, such as a global file system. |
| One could create a reservation for all of those licenses |
| in order to perform maintenance on that resource.</p> |
| <pre> |
| $ scontrol create reservation starttime=2009-04-06T16:00:00 \ |
| duration=120 user=root flags=maint,ignore_jobs \ |
| licenses=lustre*1000 |
| Reservation created: root_4 |
| |
| $ scontrol show reservation |
| ReservationName=root_4 StartTime=2009-04-06T16:00:00 |
| EndTime=2009-04-06T18:00:00 Duration=120 |
| Nodes= NodeCnt=0 |
| Features=(null) PartitionName=(null) |
| Flags=MAINT,SPEC_NODES,IGNORE_JOBS Licenses=lustre*1000 |
| Users=root Accounts=(null) |
| </pre> |
| |
| <p>Another mode of operation would be to reserve specific nodes |
| for an indefinite period in order to study problems on those |
| nodes. This could also be accomplished using a SLURM partition |
| specifically for this purpose, but that would fail to capture |
| the maintenance nature of their use.</p> |
| <pre> |
| $ scontrol create reservation user=root starttime=now \ |
| duration=infinite flags=maint nodes=sun000 |
| Reservation created: root_5 |
| |
| $ scontrol show res |
| ReservationName=root_5 StartTime=2009-02-04T16:22:57 |
| EndTime=2009-02-04T16:21:57 Duration=4294967295 |
| Nodes=sun000 NodeCnt=1 |
| Features=(null) PartitionName=(null) |
| Flags=MAINT,SPEC_NODES Licenses=(null) |
| Users=root Accounts=(null) |
| </pre> |
| |
| <p>Our final example is to reserve ten nodes in the default |
| SLURM partition starting at noon and with a duration of 60 |
| minutes occurring daily. The reservation will be available |
| only to users alan and brenda.</p> |
| <pre> |
| $ scontrol create reservation user=alan,brenda \ |
| starttime=noon duration=60 flags=daily nodecnt=10 |
| Reservation created: alan_6 |
| |
| $ scontrol show res |
| ReservationName=alan_6 StartTime=2009-02-05T12:00:00 |
| EndTime=2009-02-05T13:00:00 Duration=60 |
| Nodes=sun[000-003,007,010-013,017] NodeCnt=10 |
| Features=(null) PartitionName=pdebug |
| Flags=DAILY Licenses=(null) |
| Users=alan,brenda Accounts=(null) |
| </pre> |
| |
| <p>Note that specific nodes to be associated with the reservation are |
| made immediately after creation of the reservation. This permits |
| users to stage files to the nodes in preparation for use during the |
| reservation. Note that the reservation creation request can also |
| identify the partition from which to select the nodes or _one_ |
| feature that every selected node must contain.</p> |
| |
| <h2>Reservation Use</h2> |
| |
| <p>The reservation create response includes the reservation's name. |
| This name is automatically generated by SLURM based upon the first |
| user or account name and a numeric suffix. In order to use the |
| reservation, the job submit request must explicitly specify that |
| reservation name. The job must be contained completely within the |
| named reservation. The job will be canceled after the reservation |
| reaches its EndTime. If letting the job continue execution after |
| the reservation EndTime, a configuration option <i>ResvOverRun</i> |
| can be set to control how long the job can continue execution.</p> |
| <pre> |
| $ sbatch --reservation=alan_6 -N4 my.script |
| sbatch: Submitted batch job 65540 |
| </pre> |
| |
| <h2>Reservation Modification</h2> |
| |
| <p>Reservations can be modified by user root as desired. |
| For example their duration could be altered or the users |
| granted access changed as shown below:</p> |
| <pre> |
| $ scontrol update ReservationName=root_3 \ |
| duration=150 users=admin |
| Reservation updated. |
| |
| bash-3.00$ scontrol show ReservationName=root_3 |
| ReservationName=root_3 StartTime=2009-02-06T16:00:00 |
| EndTime=2009-02-06T18:30:00 Duration=150 |
| Nodes=ALL NodeCnt=20 Features=(null) |
| PartitionName=(null) Flags=MAINT,SPEC_NODES Licenses=(null) |
| Users=admin Accounts=(null) |
| </pre> |
| |
| <h2>Reservation Deletion</h2> |
| |
| <p>Reservations are automatically purged after their end time. |
| They may also be manually deleted as shown below. |
| Note that a reservation can not be deleted while there are |
| jobs running in it.</p> |
| <pre> |
| $ scontrol delete ReservationName=alan_6 |
| </pre> |
| |
| <h2>Overlapping Reservations</h2> |
| |
| <p>By default, reservations must not overlap. They must either include |
| different nodes or operate at different times. If specific nodes |
| are not specified when a reservation is created, SLURM will |
| automatically select nodes to avoid overlap and insure that |
| the selected nodes are available when the reservation begins.</p> |
| |
| <p>There is very limited support for overlapping reservations |
| with two specific modes of operation available. |
| For ease of system maintenance, you can create a reservation |
| with the "maint" flag that overlaps existing reservations. |
| This permits an administrator to easily create a maintenance |
| reservation for an entire cluster without needing to remove |
| or reschedule pre-existing reservations. Users requesting access |
| to one of these pre-existing reservations will be prevented from |
| using resources that are also in this maintenance reservation. |
| For example, users alan and brenda might have a reservation for |
| some nodes daily from noon until 1PM. If there is a maintenance |
| reservation for all nodes starting at 12:30PM, the only jobs they |
| may start in their reservation would have to be completed by 12:30PM, |
| when the maintenance reservation begins.</p> |
| |
| <p>The second exception operates in the same manner as a maintenance |
| reservation except that is it not logged in the accounting system as nodes |
| reserved for maintenance. |
| It requires the use of the "overlap" flag when creating the second |
| reservation. |
| This might be used to insure availability of resources for a specific |
| user within a group having a reservation. |
| Using the previous example of alan and brenda having a 10 node reservation |
| for 60 minutes, we might want to reserve 4 nodes of that for for brenda |
| during the first 30 minutes of the time period. |
| In this case, the creation of one overlapping reservation (for a total of |
| two reservations) may be simpler than creating three separate reservations, |
| partly since the use of any reservation requires the job specification |
| of the reservation name. |
| <ol> |
| <li>A six node reservation for both alan and brenda that lasts the full |
| 60 minutes</li> |
| <li>A four node reservation for brenda for the first 30 minutes</li> |
| <li>A four node reservation for both alan and brenda that lasts for the |
| final 30 minutes</li> |
| </ol></p> |
| |
| <p>If the "maint" or "overlap" flag is used when creating reservations, |
| one could create a reservation within a reservation within a third |
| reservation. |
| Note a reservation having a "maint" or "overlap" flag will not have |
| resources removed from it by a subsequent reservation also having a |
| "maint" or "overlap" flag, so nesting of reservations only works to a |
| depth of two.</p> |
| |
| <h2>Reservation Accounting</h2> |
| |
| <p>Jobs executed within a reservation are accounted for using the appropriate |
| user and bank account. If resources within a reservation are not used, those |
| resources will be accounted for as being used by all users or bank accounts |
| associated with the reservation on an equal basis (e.g. if two users are |
| eligible to use a reservation and neither does, each user will be reported |
| to have used half of the reserved resources).</p> |
| |
| <h2>Future Work</h2> |
| |
| <p>Several enhancements are anticipated at some point in the future. |
| <ol> |
| <li>The automatic selection of nodes for a reservation create request may be |
| sub-optimal in terms of locality (for optimized application |
| communication).</li> |
| <li>Reservations made within a partition having gang scheduling assumes |
| the highest level rather than the actual level of time-slicing when |
| considering the initiation of jobs. |
| This will prevent the initiation of some jobs which would complete execution |
| before a reservation given fewer jobs to time-slice with.</li> |
| </ol> |
| |
| |
| <p style="text-align: center;">Last modified 25 August 2009</p> |
| |
| <!--#include virtual="footer.txt"--> |
| |