blob: 5c38f43c15eb416ad188c3a531d73a51f2d8e1a8 [file] [log] [blame] [edit]
RELEASE NOTES FOR SLURM VERSION 14.03
26 February 2014
IMPORTANT NOTE:
If using the slurmdbd (Slurm DataBase Daemon) you must update this first.
The 14.03 slurmdbd will work with Slurm daemons of version 2.5 and above.
You will not need to update all clusters at the same time, but it is very
important to update slurmdbd first and having it running before updating
any other clusters making use of it. No real harm will come from updating
your systems before the slurmdbd, but they will not talk to each other
until you do. Also at least the first time running the slurmdbd you need to
make sure your my.cnf file has innodb_buffer_pool_size equal to at least 64M.
You can accomplish this by adding the line
innodb_buffer_pool_size=64M
under the [mysqld] reference in the my.cnf file and restarting the mysqld.
This is needed when converting large tables over to the new database schema.
Slurm can be upgraded from version 2.5 or 2.6 to version 14.03 without loss of
jobs or other state information. Upgrading directly from an earlier version of
Slurm will result in loss of state information.
HIGHLIGHTS
==========
-- Added support for native Slurm operation on Cray systems (without ALPS).
-- Added partition configuration parameters AllowAccounts, AllowQOS,
DenyAccounts and DenyQOS to provide greater control over use.
-- Added the ability to perform load based scheduling. Allocating resources to
jobs on the nodes with the largest number if idle CPUs.
-- Added support for reserving cores on a compute node for system services
(core specialization)
-- Add mechanism for job_submit plugin to generate error message for srun,
salloc or sbatch to stderr.
-- Support for Postgres database has long since been out of date and
problematic, so it has been removed entirely. If you would like to
use it the code still exists in <= 2.6, but will not be included in
this and future versions of the code.
-- Added new structures and support for both server and cluster resources.
-- Significant performance improvements, especially with respect to job
array support.
RPMBUILD CHANGES
================
-- The rpmbuild option for a Cray system with ALPS has changed from
%_with_cray to %_with_cray_alps.
-- CRAY - Do not package Slurm's libpmi or libpmi2 libraries. The Cray version
of those libraries must be used.
CONFIGURATION FILE CHANGES (see man appropriate man page for details)
=====================================================================
-- Added AuthInfo configuration parameter to specify port used by
authentication plugin.
-- Added CoreSpecPlugin configuration parameter and core specialization plugin
infrastructure.
-- Added JobAcctGatherParams configuration parameter.
-- Added JobContainerType configuration parameter and plugin infrastructure.
-- Added FairShareDampeningFactor configuration parameter to offer a greater
priority range based upon utilization.
-- Added LogTimeFormat configuration parameter to control format of the
timestamp in log files.
-- Added PrologFlags configuration parameter to control when the Prolog is run.
-- Added SlurmdPlugstack configuration parameter for generic slurmd daemon
plugin mechanism.
-- Added partition configuration parameters AllowAccounts, AllowQOS,
DenyAccounts and DenyQOS to provide greater control over use.
-- SelectType: select/cray has been renamed select/alps.
-- Added SchedulingParameters paramter of "CR_LLN" and partition parameter of
"LLN=yes|no" to enable loadbased scheduling.
-- In SchedulerParameters, change default max_job_bf value from 50 to 100.
-- Added SchedulerParameters option of "partition_job_depth" to limit
scheduling logic depth by partition.
-- Added SchedulerParameters option of "bf_max_job_start".
-- In SchedulerParameters, replace "max_job_bf" with "bf_max_job_test"
(both will work for now).
-- Add SchedulerParameters options of "preempt_reorder_count" and
"preempt_strict_order".
-- Change MaxArraySize from 16-bit to 32-bit field.
-- Added fields to "scontrol show job" output: boards_per_node,
sockets_per_board, ntasks_per_node, ntasks_per_board, ntasks_per_socket,
ntasks_per_core, and nice.
-- gres.conf - Add "NodeName" specification so that a single gres.conf file
can be used for a heterogeneous cluster.
-- Added DebugFlags value of "License".
DBD CONFIGURATION FILE CHANGES (see "man slurmdbd.conf" for details)
====================================================================
COMMAND CHANGES (see man pages for details)
===========================================
-- Added sbatch --signal option of "B:" to signal the batch shell rather than
only the spawned job steps.
-- Added sinfo and squeue format option of "%all" to print all fields available
for the data type with a vertical bar separating each field.
-- Add StdIn, StdOut, and StdErr paths to job information dumped with
"scontrol show job".
-- Permit Slurm administrator to submit a batch job as any user.
-- Added -I|--item-extract option to sh5util to extract data item from series.
-- Add squeue output format options for job command and working directory
(%o and %Z respectively).
-- Add stdin/out/err to sview job output.
-- Added a new option to the scontrol command to view licenses that are configured
in use and available. 'scontrol show licenses'.
-- Permit jobs in finished state to be requeued.
-- Added a new option to scontrol to put a requeued job on hold. A requeued
job can be put in a new special state called SPECIAL_EXIT indicating
the job has exited with a special value.
"scontrol requeuehold state=SpecialExit 123".
-- Pending job steps will have step_id of INFINITE rather than NO_VAL and
will be reported as "TBD" by scontrol and squeue commands.
-- Added sgather tool to gather files from a job's compute nodes into a
central location.
-- Added -S/--core-spec option to salloc, sbatch and srun commands to reserve
specialized cores for system use.
-- Added sacctmgr options to create, modify, list & delete Resources.
OTHER CHANGES
=============
-- Added job_info() and step_info() functions to the gres plugins to extract
plugin specific fields from the job's or step's GRES data structure.
API CHANGES
===========
-- Added "state" argument to slurm_requeue function.
-- Added sacctmgr options to create, modify, list & delete Resources.
Changed members of the following structs
========================================
-- Struct job_info / slurm_job_info_t: Changed exc_node_inx, node_inx, and
req_node_inx from type int to type int32_t; Changed array_task_id from type
uint16_t to type uint32_t
-- job_step_info_t: Changed node_inx from type int to type int32_t; Changed
array_task_id from type uint16_t to type uint32_t; Changed node_inx from
int to int32_t
-- Struct partition_info / partition_info_t: Changed node_inx from type int
to type int32_t
-- block_job_info_t: Changed cnode_inx from type int to type int32_t
-- block_info_t: Changed ionode_inx and mp_inx from type int to type int32_t
-- Struct reserve_info / reserve_info_t: Changed node_inx from type int to
type int32_t
-- Struct slurm_ctl_conf / slurm_ctl_conf_t: Changed max_array_sz from uint16_t
to uint32_t.
-- Struct reserve_info / reserve_info_t: Changed flags from uint16_t to
uint32_t.
-- Struct resv_desc_msg / resv_desc_msg_t: Changed flags from uint16_t to
uint32_t.
Added the following struct definitions
======================================
-- Struct job_info / slurm_job_info_t: Added std_err, std_in and std_out.
-- Struct job_info / slurm_job_info_t: Added core_spec
-- struct job_descriptor / job_desc_msg_t: Added core_spec, warn_flags
-- Struct partition_info / partition_info_t: Added allow_accounts, allow_qos,
deny_accounts, deny_groups and deny_qos.
-- struct slurm_step_launch_params_t: Added partition (for Prolog environment
variable)
-- Struct resource_allocation_response_msg: Added partition (for Prolog
environment variable)
-- Struct slurm_ctl_conf / slurm_ctl_conf_t: Added acct_gather_conf, authinfo.
core_spec_plugin, ext_sensors_conf, fs_dampening_factor,
job_acct_gather_params, job_container_plugin, log_fmt, prolog_flags and
slurmd_plugstack.
-- Added license data structures: slurm_license_info and license_info_msg.
Changed the following enums and #defines
========================================
-- Add new job_state of JOB_BOOT_FAIL for job terminations due to failure to
boot it's allocated nodes or BlueGene block.
-- Added set of job state change flags: JOB_REQUEUE, JOB_REQUEUE_HOLD,
JOB_SPECIAL_EXIT.
-- Added new job_state_reason values: WAIT_BLOCK_MAX_ERR, WAIT_BLOCK_D_ACTION,
WAIT_CLEANING and WAIT_PROLOG.
-- Added new select_jobdata_type values: SELECT_JOBDATA_CLEANING and
SELECT_JOBDATA_NETWORK.
-- Added new node state values: NODE_STATE_NET, NODE_STATE_RES and
NODE_STATE_UNDRAIN.
-- Added new flags for SelectTypeParameters: CR_OTHER_CONS_RES, CR_NHC_STEP_NO,
CR_LLN, and CR_NHC_NO.
-- Added new PriorityFlags value of PRIORITY_FLAGS_DEPTH_OBLIVIOUS
-- Added new partition flags: PART_FLAG_LLN and PART_FLAG_LLN_CLR
-- Added DebugFlags: DEBUG_FLAG_JOB_CONT and DEBUG_FLAG_TASK
-- Removed DebugFlags: DEBUG_FLAG_THREADID
-- Added PrologFlag: PROLOG_FLAG_ALLOC
-- Added LogTimeFormat options: LOG_FMT_ISO8601_MS, LOG_FMT_ISO8601,
LOG_FMT_RFC5424_MS, LOG_FMT_RFC5424, LOG_FMT_CLOCK, LOG_FMT_SHORT and
LOG_FMT_THREAD_ID
Added the following API's
=========================
-- Added functions for retrieving license information: slurm_load_licenses and
slurm_free_license_info_msg.
-- Added functions to get specific job information: slurm_get_job_stderr,
slurm_get_job_stdin and slurm_get_job_stdout.
Change the following API's
===========================
-- Add task pointer to the task_post_term() function in task plugins. The
terminating task's PID is available in task->pid.
DBD API Changes
===============
Changed members of the following structs
========================================
Added the following struct definitions
======================================
-- struct slurmdb_clus_res_rec_t: new cluster resource record.
-- struct slurmdb_res_cond_t : new resource condition.
-- struct slurmdb_res_rec_t: new resource record.
Added the following enums and #defines
========================================
-- enum slurmdb_resource_type_t: server resource record "type" definition.
-- Added new slurmdb_update_type_t SLURMDB_ADD_RES,
SLURMDB_REMOVE_RES, SLURMDB_MODIFY_RES, SLURMDB_REMOVE_QOS_USAGE
-- Added new flags for Resources SLURMDB_RES_FLAG_BASE,
SLURMDB_RES_FLAG_NOTSET, SLURMDB_RES_FLAG_ADD, SLURMDB_RES_FLAG_REMOVE
-- Added new flags for QOS QOS_FLAG_BASE
Added the following API's
=========================
-- Added functions for resource management in the database.