blob: 207d17148ed67d02bf10de2eb71fb4dc599c7fda [file] [log] [blame] [edit]
LLNL CHAOS-SPECIFIC RELEASE NOTES FOR SLURM VERSION 2.2
1 December 2010
This lists only the most significant changes from SLURM v2.1 to v2.2
with respect to Chaos systems. See the file RELEASE_NOTES for a more
complete description of changes.
Mostly for system administrators:
* SLURM version 2.2 is able to read version 2.1 state files and preserve all
running and pending state. SLURM version 2.1 is *not* able to use state save
files generated by version 2.2, so this is a non-reversible transition.
* Added new configuration parameter JobSubmitPlugins which provides a mechanism
to set default job parameters or perform other site-configurable actions at
job submit time. Site-specific job submission plugins may be written either C
or LUA.
* We have given Operators, Administrators, and bank account Coordinators (as
defined in the SLURM database) the ability to invoke commands that view/modify
user jobs and reservations. Previously, one had to be root to invoke
"scontrol update JobId" for example. In addition, Administrators have the
ability to view/modify node and partition info without having to become root.
For more details, see AUTHORIZATION section of the man pages for the
following commands: scontrol, scancel and sbcast.
Mostly for users:
* Job submission commands (salloc, sbatch and srun) have a new option,
--time-min, that permits the job's time limit to be reduced to the extent
required to start early through backfill scheduling with the minimum value
as specified.
* Support has been added for TotalView to attach to a subset of launched tasks
instead of requiring that all tasks be attached to.
* scontrol now has the ability to shrink a job's size. Use a command of
"scontrol update JobId=# NumNodes=#" or
"scontrol update JobId=# NodeList=<names>". This command generates a script
to be executed in order to reset SLURM environment variables for proper
execution of subsequent job steps.
* Users can hold and release their own jobs. Submit in held state using srun
or sbatch --hold or -H options. Hold after submission using the command
"scontrol hold <jobid>". Release with "scontrol release <jobid>". Users can
not release jobs held by system administrator.
* Added support for a default account and wckey per cluster within accounting.
* SLURM commands (squeue, sinfo, sbatch, etc...) can now operate between
clusters. Jobs can also be submitted with sbatch to other cluster(s) with the
job routed to the one cluster expected to initiated the job first. This
functionality relies upon the SlurmDBD (SLURM DataBase Daemon) to provide
communication information (address and port) for a command to locate the
SLURM control daemon (slurmctld) on other clusters.