| RELEASE NOTES FOR SLURM VERSION 2.3 |
| 28 July 2011 |
| |
| |
| IMPORTANT NOTE: |
| If using the slurmdbd (SLURM DataBase Daemon) you must update this first. |
| The 2.3 slurmdbd will work with SLURM daemons of version 2.1.3 and above. |
| You will not need to update all clusters at the same time, but it is very |
| important to update slurmdbd first and having it running before updating |
| any other clusters making use of it. No real harm will come from updating |
| your systems before the slurmdbd, but they will not talk to each other |
| until you do. Also at least the first time running the slurmdbd you need to |
| make sure your my.cnf file has innodb_buffer_pool_size equal to at least 64M. |
| You can accomplish this by adding the line |
| |
| innodb_buffer_pool_size=64M |
| |
| under the [mysqld] reference in the my.cnf file and restarting the mysqld. |
| This is needed when converting large tables over to the new database schema. |
| |
| SLURM can be upgraded from version 2.2 to version 2.3 without loss of jobs or |
| other state information. |
| |
| |
| HIGHLIGHTS |
| ========== |
| * Support has been added for Cray XT and XE computers |
| * Support has been added for BlueGene/Q computers. |
| * For architectures where the slurmd daemon executes on front end nodes (Cray |
| and BlueGene systems) more than one slurmd daemon may be executed using more |
| than one front end node for improved fault-tolerance and performance. |
| NOTE: The slurmctld daemon will report the lack of a front_end_state file |
| as an error when first started in this configuration. |
| * The ability to expand running jobs was added |
| * The ability to control how many leaf switches a job is allocated and the |
| maximum delay to get that leaf switch count can be controlled. |
| |
| CONFIGURATION FILE CHANGES (see "man slurm.conf" for details) |
| ============================================================= |
| * In order to support more than one front end node, new parameters have been |
| added to support a new data structure: FrontendName, FrontendAddr, Port, |
| State and Reason. |
| * Added DebugFlags option of Frontend |
| * Added new configuration parameter MaxJobId. Use with FirstJobId to limit |
| range of job ID values. |
| * Added new configuration parameter MaxStepCount to limit the effect of |
| bad batch scripts. The default value is 40,000 steps per job. |
| * Changed node configuration parameter from "Procs" to "CPUs". Both parameters |
| will be supported for now. |
| * Added GraceTime to Partition and QOS data structures. Preempted jobs will be |
| given this time interval before termination. |
| * Added AccountingStoreJobComment to control storing job's comment field in |
| the accounting database. |
| * More than one TaskPlugin can be configured in a comma separated list. |
| * DefMemPerCPU, DefMemPerNode, MaxMemPerCPU and MaxMemPerNode configuration |
| options added on a per-partition basis. |
| * SchedulerParameters can now control the maximum delay that a job can set in |
| order to be allocated some desired leaf switch count by specifying a value |
| for max_switch_wait. |
| |
| COMMAND CHANGES (see man pages for details) |
| =========================================== |
| * Added scontrol ability to get and set front end node state. |
| * Added scontrol ability to set slurmctld's DebugFlags. |
| * Added scontrol ability to increment or decrement a job or step time limit. |
| * Added new scontrol option of "show aliases" to report every NodeName that is |
| associated with a given NodeHostName when running multiple slurmd daemons |
| per compute node (typically used for testing purposes). |
| * Added new squeue optioni of -R/--reservation option as a job filter. |
| * A reservation flag of "License_Only" has been added for use by the sview and |
| scontrol commands. If set, then jobs using the reservation may use the |
| licenses associated with it plus any compute nodes. Otherwise the job is |
| limited to the compute nodes associated with the reservation. |
| * The dependency option of "expand" has been added. This option identifies a |
| job whose resource allocation is intended to be used to expand the allocation |
| of another job. See http://www.schedmd.com/slurmdocs//faq.html#job_size |
| for a description of it's use. |
| * Added --switches option to salloc, sbatch and srun commands to control the |
| desired number of switches allocated to a job and the maximum delay before |
| starting the job with more leaf switches. |
| * Added scontrol ability to modify a job's desired switch count or delay. |
| |
| BLUEGENE SPECIFIC CHANGES |
| ========================= |
| * Bluegene/Q support added. |
| * The select/bluegene plugin has been substantially re-written. |
| |
| OTHER CHANGES |
| ============= |
| * Improved accuracy of estimated job start time for pending jobs. This should |
| substantially improve scheduling of jobs elibable to execute on more than one |
| cluster. |
| * Job dependency information will only show the currently active dependencies |
| rather than the original dependencies. |
| * Added a reservation flag of "License_Only". If set, then jobs using the |
| reservation may use the licenses associated with it plus any compute nodes. |
| * Added proctrack/cgroup and task/cgroup plugins to support Linux cgroups. |
| |
| API CHANGES |
| =========== |
| |
| Changed members of the following structs |
| ======================================== |
| block_info_t |
| Added job_list |
| Added used_mp_inx |
| Added used_mp_str |
| bp_inx -> mp_inx |
| conn_type -> conn_type(DIMENSIONS] |
| ionodes -> ionode_str |
| nodes -> mp_str |
| node_cnt -> cnode_cnt |
| |
| job_desc_msg_t |
| conn_type -> conn_type(DIMENSIONS] |
| |
| job_step_info_t |
| Added select_jobinfo |
| |
| partition_info_t |
| Added def_mem_per_cpu and max_mem_per_cpu |
| |
| Added the following struct definitions |
| ====================================== |
| block_job_info_t entirely new structure |
| |
| front_end_info_msg_t entirely new structure |
| |
| front_end_info_t entirely new structure |
| |
| job_info_t |
| batch_host name of the host running the batch script |
| batch_script contents of batch script |
| preempt_time time that a job become preempted |
| req_switches maximum number of leaf switches |
| wait4switches maximum delay to get desired leaf switch count |
| |
| job_step_create_response_msg_t |
| select_jobinfo data needed from the select plugin for a step |
| |
| job_step_info_t |
| select_jobinfo data needed from the select plugin for a step |
| |
| node_info_t |
| node_addr communication name (optional) |
| node_hostname node's hostname (optional) |
| |
| partition_info_t |
| grace_time preempted job's grace time in seconds |
| |
| slurm_ctl_conf |
| acctng_store_job_comment if set, store job's comment field in |
| accounting database |
| max_job_id maximum supported job id before starting over |
| with first_job_id |
| max_step_count maximum number of job steps permitted per job |
| |
| slurm_step_layout |
| front_end name of front end host running the step |
| |
| slurmdb_qos_rec_t |
| grace_time preempted job's grace time in seconds |
| |
| update_front_end_msg_t entirely new structure |
| |
| |
| Changed the following enums and #defines |
| ======================================== |
| job_state_reason |
| FAIL_BANK_ACCOUNT -> FAIL_ACCOUNT |
| FAIL_QOS /* invalid QOS */ |
| WAIT_QOS_THRES /* required QOS threshold has been breached */ |
| |
| select_jobdata_type (Size of many data structures increased) |
| SELECT_JOBDATA_BLOCK_NODE_CNT /* data-> uint32_t block_cnode_cnt */ |
| SELECT_JOBDATA_BLOCK_PTR /* data-> bg_record_t *bg_record */ |
| SELECT_JOBDATA_DIM_CNT /* data-> uint16_t dim_cnt */ |
| SELECT_JOBDATA_NODE_CNT /* data-> uint32_t cnode_cnt */ |
| SELECT_JOBDATA_PAGG_ID /* data-> uint64_t job container ID */ |
| SELECT_JOBDATA_PTR /* data-> select_jobinfo_t *jobinfo */ |
| SELECT_JOBDATA_START_LOC /* data-> uint16_t |
| * start_loc[SYSTEM_DIMENSIONS] */ |
| select_jobdata_type (Added) |
| SELECT_PRINT_START_LOC /* Print just the start location */ |
| select_jobdata_type (Names changed) |
| SELECT_GET_BP_CPU_CNT --> SELECT_GET_MP_CPU_CNT |
| SELECT_SET_BP_CNT ------>SELECT_SET_MP_CNT |
| |
| select_nodedata_type |
| SELECT_NODEDATA_PTR /* data-> select_nodeinfo_t *nodeinfo */ |
| |
| select_print_mode |
| SELECT_PRINT_START_LOC /* Print just the start location */ |
| |
| select_type_plugin_info no longer exists. It's contents are now mostly #defines |
| |
| DEBUG_FLAG_FRONT_END added DebugFlags of Frontend |
| |
| JOB_PREEMPTED added new job termination state to indicated |
| job termination was due to preemption |
| |
| RESERVE_FLAG_LIC_ONLY reserve licenses only, use any nodes |
| |
| TRIGGER_RES_TYPE_FRONT_END added trigger for frontend state changes |
| |
| |
| Added the following API's |
| ========================= |
| slurm_free_front_end_info_msg free front end state information |
| slurm_init_update_front_end_msg initialize data structure for front end update |
| slurm_load_front_end load front end state information |
| slurm_print_front_end_info_msg print all front end state information |
| slurm_print_front_end_table print state information for one front end node |
| slurm_set_debugflags set new DebugFlags in slurmctld daemon |
| slurm_sprint_front_end_table output state information for one front end node |
| slurm_update_front_end update state of front end node |
| |
| |
| Changed the following API's |
| =========================== |