| RELEASE NOTES FOR SLURM VERSION 1.2 |
| 13 March 2007 |
| |
| |
| IMPORTANT NOTE: |
| SLURM state files in version 1.2 are different from those of version 1.1. |
| After installing SLURM version 1.2, plan to restart without preserving |
| jobs or other state information. Restart daemons with the "-c" option or |
| use "/etc/init.d/slurm startclean". |
| |
| NEW COMMANDS |
| |
| * Several new commands have been added to perform individual srun functions. |
| The srun command will continue to exist, but these commands may offer |
| greater clarity and ease of use. The srun options --allocate, --attach, |
| and --batch will eventually cease being supported. The new commands are |
| salloc - Create a job allocation (functions like "srun --allocate") |
| sattach - Attach to an existing job step (functions like "srun --attach") |
| sbatch - Submit a batch job script (functions like "srun --batch") |
| See the individual man pages for more information. |
| |
| * A new command, strigger, was added to manage event triggers. This was |
| added in SLURM version 1.2.2. See strigger's man pages for details. |
| |
| * A new GUI is available for viewing and modifying state information, sview. |
| Note that sview will only be built on systems which have libglade-2.0 and |
| gtk+-2.0 installed. |
| |
| |
| CONFIGURATION FILE CHANGES |
| |
| * The slurm.conf configuration file now supports a "Include" directive to |
| include other files inline. |
| |
| * Added new configuration parameter MessageTimeout (replaces #define in the |
| code). |
| |
| * Added new configuration parameter MailProg (in case mail program is not |
| at "/bin/mail"). |
| |
| * Added new configuration parameter TaskPluginParam (for future use, to |
| control which task affinity functions are used, "cpusets" or "sched" for |
| schedsetaffinity). |
| |
| * Removed defunct configuration parameter, ShedulerAuth. It has been rendered |
| obsolete by the new wiki.conf file. |
| |
| * Several new configuration parameters can be used to specify architectural |
| details of the nodes: Sockets, CoresPerSocket, and ThreadsPerCore. |
| For multi-core systems, these parameters will typically need to be |
| specified in the configuration file. |
| |
| * For select/cons_res, an assortment of consumable resources can be supported |
| including: CPUs, cores, sockets and/or memory. See new configuration |
| parameter SelectTypeParameters. |
| |
| * For BlueGene systems only: Alternate BlrtsImage, LinuxImage, MloaderImage, |
| and RamDiskImage file can be specified along with specific groups that |
| can use them. |
| |
| * MPI plugin for use with MPICH-GM renamed from "mpich-gm" to "mpichgm". |
| (There was previously some inconsistent naming.) |
| |
| OTHER CHANGES |
| |
| * Several srun options are available to control layout of tasks across the |
| cores on a node: --extra-node-info, --ntasks-per-node, --sockets-per-node, |
| --cores-per-socket, --threads-per-core, --minsockets, --mincores, |
| --minthreads, --ntasks-per-socket, --ntasks-per-core, and --ntasks-per-node. |
| |
| * New scontrol, sinfo and squeue options can be used to view socket, core, |
| and task details by job and/or node. |
| |
| * The scontrol and squeue commands will provide a more detailed explanation |
| of why a job was put into FAILED state. |
| |
| * Permit batch jobs to be requeued ("scontrol requeue <jobid>" or |
| "srun --batch --no-requeue ..." to prevent). |
| |
| * View a job's exit code using "scontrol show job". |
| |
| * Added "account" field to job and step accounting information and sacct output. |
| |
| * Added new job field, "comment". Set by srun, salloc and sbatch. View |
| with "scontrol show job". Used by sched/wiki2. |
| |
| * Added support for OS X operating system. |
| |
| * A job step's actual run time is maintained with suspend/resume use. |
| |
| * Added support for Portable Linux Processor Affinity in the task/affinity |
| plugin (PLPA, see http://www.open-mpi.org/software/plpa). |
| |
| * There is a new version of the Wiki scheduler plugin to interface with |
| the Moab Scheduler. It can be accessed with the slurm.conf parameter |
| "SchedulerType=sched/wiki2". Continue using "SchedulerType=sched/wiki" |
| for the Maui Scheduler at this time and see the section below for details. |
| If you use sched/wiki2, you will at least need to add a wiki.conf file. |
| Key differences include: |
| - Node and job data returned is correct (several errors in old plugin) |
| - Node data includes partition information (CCLASS field) |
| - Improved error handling |
| - Support added for configuration file ("wiki.conf" in same directory |
| as "slurm.conf" file, see "man wiki.conf" for details) |
| - Support added for job modify, suspend/resume, and requeue |
| - Authentication of communications now supported |
| - Notification of scheduler on events (job submitted or termination) |
| - There is no longer a "sched-wiki" RPM. All files are in the main RPM. |
| |
| * The sched/wiki plugin has been re-written for use with the Maui Scheduler. |
| Some time in early 2007 the Maui Scheduler should be upgraded to use |
| the additional capabilities offered by sched/wiki2. Changes in sched/wiki |
| include: |
| - Node and job data returned is correct (several errors in old plugin) |
| - Support added for configuration file ("wiki.conf" in same directory |
| as "slurm.conf" file, see "man wiki.conf" for details). Currently |
| wiki.conf is only used to store an authentication key, which is not |
| used by most versions of the Maui Scheduler. |
| - There is no longer a "sched-wiki" RPM. All files are in the main RPM. |
| |
| * The features associated with a node can now be changed using the |
| "scontrol update" and "sview" commands. |
| |
| * For BlueGene system only: Added "--reboot" option to srun, salloc, and |
| sbatch commands to force booting of nodes before starting the job. |
| |
| * For BlueGene Systems only: Changed way of keeping track of smaller |
| partitions using ionode range instead of quarter nodecard notation. |
| (i.e. bgl000[0-3] instead of bgl000.0.0) |
| |
| * For BlueGene Systems only: New scontrol options to down specific ionodes |
| without having to create blocks there (i.e. |
| "scontrol update subbp=bgl000[0-3] state=error"). This will down the first |
| four ionodes or on a BGL system the first nodecard in the base partition. |
| |
| * For BlueGene Systems only: sinfo will now display correct node counts for |
| blocks in an error state or blocks and small blocks that are allocated. |
| This results in some overlap of nodes displayed if you are running with |
| Small blocks but the node counts will be correct for the states given. |
| |
| * For BlueGene Systems only: A state save/recover file has been added to save |
| the state of the system (primarily for blocks in an error state) to be loaded |
| when the system has been brought back up. |
| |
| * Added "scontrol listpids" command to identify processes associated with |
| specific jobs and/or steps. |
| |
| See the file NEWS for more details. |