| <!--#include virtual="header.txt"--> |
| |
| <h1>Resource Reservation Guide</h1> |
| |
| <p>SLURM version 2.0 has the ability to reserve resources for jobs |
| being executed by select users and/or select bank accounts. |
| A resource reservation identifies the resources in that reservation |
| and a time period during which the reservation is available. |
| The resouces which can be reserved include nodes and/or licenses. |
| Note that resource reservations are not compatible with SLURM's |
| gang scheduler plugin since the termination time of running jobs |
| is not possible to accurately predict.</p> |
| |
| <p>Note that reserved licenses are treated somewhat differently than reserved |
| nodes. When nodes are reserved, then jobs using that reservation can use only |
| those nodes and no other jobs can use those nodes. Reserved licenses can only |
| be used by jobs associated with that reservation, but licenses not explicitly |
| reserved are available to any job. This eliminates the need to explicitly |
| put licenses into every advanced reservation created.</p> |
| |
| <p>Reservations can be created, updated, or destroyed only by user root |
| or the configured <i>SlurmUser</i> using the <i>scontrol</i> command. |
| The <i>scontrol</i>, <i>smap</i> and <i>sview</i> commands can be used |
| to view reservations. |
| The man pages for the various commands contain details.</p> |
| |
| <p>Note for users of Maui or Moab schedulers: <br> |
| Maui and Moab are not integrated with SLURM's resource reservation system, |
| but should use their own advanced reservation system.</p> |
| |
| <h2>Reservation Creation</h2> |
| |
| <p>One common mode of operation for a reservation would be to reserve |
| an entire computer at a particular time for a system down time. |
| The example below shows the creation of a full-system reservation |
| at 16:00 hours on 6 February and lasting for 120 minutes. |
| The "maint" flag is used to identify the reservation for accounting |
| purposes as system maintenance. |
| The "ignore_jobs" flag is used to indicate that we can ignore currently |
| running jobs when creating this reservation. |
| By default, only nodes which are not expected to have a running job |
| at the start time can be reserved (the time limit of all running |
| jobs will have been reached). |
| In this case we can manually cancel the running jobs as needed |
| to perform system maintenance. |
| As the reservation time approaches, |
| only jobs that can complete by the reservation time will be |
| initiated.</p> |
| <pre> |
| $ scontrol create reservation starttime=2009-02-06T16:00:00 \ |
| duration=120 user=root flags=maint,ignore_jobs nodes=ALL |
| Reservation created: root_3 |
| |
| $ scontrol show reservation |
| ReservationName=root_3 StartTime=2009-02-06T16:00:00 |
| EndTime=2009-02-06T18:00:00 Duration=120 |
| Nodes=ALL NodeCnt=20 |
| Features=(null) PartitionName=(null) |
| Flags=MAINT,SPEC_NODES,IGNORE_JOBS Licenses=(null) |
| Users=root Accounts=(null) |
| </pre> |
| |
| <p>A variation of this would be to configure license to represent system |
| resources, such as a global file system. |
| The system resource may not may not require an actual license for use, but |
| SLURM licenses can be used to prevent jobs needed the resource from being |
| started when that resource is unavailable. |
| One could create a reservation for all of those licenses in order to perform |
| maintenance on that resource. |
| In the example below, we create a reservation for 1000 licenses with the name |
| of "lustre". |
| If there are a total of 1000 lustre licenses configured in this cluster, |
| this reservation will prevent any job specifying the need for needed a lustre |
| license from being scheduled on this cluster during this reservation.</p> |
| <pre> |
| $ scontrol create reservation starttime=2009-04-06T16:00:00 \ |
| duration=120 user=root flags=ignore_jobs \ |
| licenses=lustre*1000 |
| Reservation created: root_4 |
| |
| $ scontrol show reservation |
| ReservationName=root_4 StartTime=2009-04-06T16:00:00 |
| EndTime=2009-04-06T18:00:00 Duration=120 |
| Nodes= NodeCnt=0 |
| Features=(null) PartitionName=(null) |
| Flags=IGNORE_JOBS Licenses=lustre*1000 |
| Users=root Accounts=(null) |
| </pre> |
| |
| <p>Another mode of operation would be to reserve specific nodes |
| for an indefinite period in order to study problems on those |
| nodes. This could also be accomplished using a SLURM partition |
| specifically for this purpose, but that would fail to capture |
| the maintenance nature of their use.</p> |
| <pre> |
| $ scontrol create reservation user=root starttime=now \ |
| duration=infinite flags=maint nodes=sun000 |
| Reservation created: root_5 |
| |
| $ scontrol show res |
| ReservationName=root_5 StartTime=2009-02-04T16:22:57 |
| EndTime=2009-02-04T16:21:57 Duration=4294967295 |
| Nodes=sun000 NodeCnt=1 |
| Features=(null) PartitionName=(null) |
| Flags=MAINT,SPEC_NODES Licenses=(null) |
| Users=root Accounts=(null) |
| </pre> |
| |
| <p>Our next example is to reserve ten nodes in the default |
| SLURM partition starting at noon and with a duration of 60 |
| minutes occurring daily. The reservation will be available |
| only to users alan and brenda.</p> |
| <pre> |
| $ scontrol create reservation user=alan,brenda \ |
| starttime=noon duration=60 flags=daily nodecnt=10 |
| Reservation created: alan_6 |
| |
| $ scontrol show res |
| ReservationName=alan_6 StartTime=2009-02-05T12:00:00 |
| EndTime=2009-02-05T13:00:00 Duration=60 |
| Nodes=sun[000-003,007,010-013,017] NodeCnt=10 |
| Features=(null) PartitionName=pdebug |
| Flags=DAILY Licenses=(null) |
| Users=alan,brenda Accounts=(null) |
| </pre> |
| |
| <p>Reservations can be optimized with respect to system topology if the |
| reservation request includes information about the sizes of jobs to be created. |
| This is especially important for BlueGene systems due to restrictive rules |
| about the topology of created blocks (due to hardware constraints and/or |
| SLURM's configuration). To take advantage of this optimization, specify the |
| sizes of jobs of to be concurrently executed. The example below creates a |
| reservation containing 4096 c-nodes on a BlueGene system so that two 2048 |
| c-node jobs can execute simultaneously.</p> |
| |
| <pre> |
| $ scontrol create reservation user=alan,brenda \ |
| starttime=noon duration=60 nodecnt=2k,2k |
| Reservation created: alan_8 |
| |
| $ scontrol show res |
| ReservationName=alan_9 StartTime=2011-12-05T12:00:00 |
| EndTime=2011-12-05T13:00:00 Duration=60 |
| Nodes=bgp[000x011,210x311] NodeCnt=4096 |
| Features=(null) PartitionName=pdebug |
| Flags= Licenses=(null) |
| Users=alan,brenda Accounts=(null) |
| </pre> |
| |
| <p>Note that specific nodes to be associated with the reservation are |
| made immediately after creation of the reservation. This permits |
| users to stage files to the nodes in preparation for use during the |
| reservation. Note that the reservation creation request can also |
| identify the partition from which to select the nodes or _one_ |
| feature that every selected node must contain.</p> |
| |
| <p>On a smaller system, one might want to reserve specific CPUs rather than |
| whole nodes. While the resolution of SLURM's resource reservation is that of |
| whole nodes, one might configure each CPU as a license to SLURM and reserve |
| those instead (we understand this is a kludge, but it does provide a way to |
| work around this shortcoming in SLURM's code). Proper enforcement then requires |
| that each job request one "cpu" license for each CPU to be allocated, which |
| can be accomplished by an appropriate job_submit plugin. In the example below, |
| we configure the system with one license named "cpu" for each CPU in the |
| system, 64 in this example, then create a reservation for 32 CPUs. The |
| user developed job_submit plugin would then explicitly set the job's |
| licenses field to require one "cpu" for each physical CPU required to satisfy |
| the request.</p> |
| <pre> |
| $ scontrol show configuration | grep Licenses |
| Licenses = cpu*64 |
| |
| $ scontrol create reservation starttime=2009-04-06T16:00:00 \ |
| duration=120 user=bob flags=ignore_jobs \ |
| licenses=cpu*32 |
| Reservation created: bob_5 |
| </pre> |
| |
| <h2>Reservation Use</h2> |
| |
| <p>The reservation create response includes the reservation's name. |
| This name is automatically generated by SLURM based upon the first |
| user or account name and a numeric suffix. In order to use the |
| reservation, the job submit request must explicitly specify that |
| reservation name. The job must be contained completely within the |
| named reservation. The job will be canceled after the reservation |
| reaches its EndTime. If letting the job continue execution after |
| the reservation EndTime, a configuration option <i>ResvOverRun</i> |
| can be set to control how long the job can continue execution.</p> |
| <pre> |
| $ sbatch --reservation=alan_6 -N4 my.script |
| sbatch: Submitted batch job 65540 |
| </pre> |
| |
| <p>Note that use of a reservation does not alter a job's priority, but it |
| does act as an enhancement to the job's priority. |
| Any job with a reservation is considered for scheduling to resources |
| before any other job in the same SLURM partition (queue) not associated |
| with a reservation.</p> |
| |
| <h2>Reservation Modification</h2> |
| |
| <p>Reservations can be modified by user root as desired. |
| For example their duration could be altered or the users |
| granted access changed as shown below:</p> |
| <pre> |
| $ scontrol update ReservationName=root_3 \ |
| duration=150 users=admin |
| Reservation updated. |
| |
| bash-3.00$ scontrol show ReservationName=root_3 |
| ReservationName=root_3 StartTime=2009-02-06T16:00:00 |
| EndTime=2009-02-06T18:30:00 Duration=150 |
| Nodes=ALL NodeCnt=20 Features=(null) |
| PartitionName=(null) Flags=MAINT,SPEC_NODES Licenses=(null) |
| Users=admin Accounts=(null) |
| </pre> |
| |
| <h2>Reservation Deletion</h2> |
| |
| <p>Reservations are automatically purged after their end time. |
| They may also be manually deleted as shown below. |
| Note that a reservation can not be deleted while there are |
| jobs running in it.</p> |
| <pre> |
| $ scontrol delete ReservationName=alan_6 |
| </pre> |
| |
| <h2>Overlapping Reservations</h2> |
| |
| <p>By default, reservations must not overlap. They must either include |
| different nodes or operate at different times. If specific nodes |
| are not specified when a reservation is created, SLURM will |
| automatically select nodes to avoid overlap and insure that |
| the selected nodes are available when the reservation begins.</p> |
| |
| <p>There is very limited support for overlapping reservations |
| with two specific modes of operation available. |
| For ease of system maintenance, you can create a reservation |
| with the "maint" flag that overlaps existing reservations. |
| This permits an administrator to easily create a maintenance |
| reservation for an entire cluster without needing to remove |
| or reschedule pre-existing reservations. Users requesting access |
| to one of these pre-existing reservations will be prevented from |
| using resources that are also in this maintenance reservation. |
| For example, users alan and brenda might have a reservation for |
| some nodes daily from noon until 1PM. If there is a maintenance |
| reservation for all nodes starting at 12:30PM, the only jobs they |
| may start in their reservation would have to be completed by 12:30PM, |
| when the maintenance reservation begins.</p> |
| |
| <p>The second exception operates in the same manner as a maintenance |
| reservation except that is it not logged in the accounting system as nodes |
| reserved for maintenance. |
| It requires the use of the "overlap" flag when creating the second |
| reservation. |
| This might be used to insure availability of resources for a specific |
| user within a group having a reservation. |
| Using the previous example of alan and brenda having a 10 node reservation |
| for 60 minutes, we might want to reserve 4 nodes of that for for brenda |
| during the first 30 minutes of the time period. |
| In this case, the creation of one overlapping reservation (for a total of |
| two reservations) may be simpler than creating three separate reservations, |
| partly since the use of any reservation requires the job specification |
| of the reservation name. |
| <ol> |
| <li>A six node reservation for both alan and brenda that lasts the full |
| 60 minutes</li> |
| <li>A four node reservation for brenda for the first 30 minutes</li> |
| <li>A four node reservation for both alan and brenda that lasts for the |
| final 30 minutes</li> |
| </ol></p> |
| |
| <p>If the "maint" or "overlap" flag is used when creating reservations, |
| one could create a reservation within a reservation within a third |
| reservation. |
| Note a reservation having a "maint" or "overlap" flag will not have |
| resources removed from it by a subsequent reservation also having a |
| "maint" or "overlap" flag, so nesting of reservations only works to a |
| depth of two.</p> |
| |
| <h2>Reservation Accounting</h2> |
| |
| <p>Jobs executed within a reservation are accounted for using the appropriate |
| user and bank account. If resources within a reservation are not used, those |
| resources will be accounted for as being used by all users or bank accounts |
| associated with the reservation on an equal basis (e.g. if two users are |
| eligible to use a reservation and neither does, each user will be reported |
| to have used half of the reserved resources).</p> |
| |
| <h2>Future Work</h2> |
| |
| <p>Several enhancements are anticipated at some point in the future. |
| <ol> |
| <li>Reservations made within a partition having gang scheduling assumes |
| the highest level rather than the actual level of time-slicing when |
| considering the initiation of jobs. |
| This will prevent the initiation of some jobs which would complete execution |
| before a reservation given fewer jobs to time-slice with.</li> |
| <li>Add support to reserve specific CPU counts rather than require whole |
| nodes be reserved (work around described above).</li> |
| </ol> |
| |
| <p style="text-align: center;">Last modified 31 January 2012</p> |
| |
| <!--#include virtual="footer.txt"--> |
| |