| commit | 83c3653e50920e8bae854c09faec211df4595db4 | [log] [tgz] |
|---|---|---|
| author | Ben Glines <ben.glines@schedmd.com> | Thu Dec 11 15:27:51 2025 -0700 |
| committer | Ben Glines <ben.glines@schedmd.com> | Thu Dec 11 16:14:24 2025 -0700 |
| tree | d85606d5c86f748abcd670664e296cdb6fb1cb0a | |
| parent | 25292a434dfef4a5881b0fe0a27ff6f4384d3f0f [diff] |
Reset garbage SLUID from older versioned slurmd registration messages When slurmd is preparing a slurm_node_registration_status_msg_t, it creates a list of all slurm_step_id_t's that it detects on the node. For slurmd's that are 25.05 or older, they will send uninitialized garbage data in the slurm_step_id_t as part of slurm_node_registration_status_msg_t, and slurmctld will interpret this as an invalid sluid and abort the job. This commit ensures that the garbage sluid is set to 0, which other logic in slurmctld is designed to recognize as an older protocol version job. Ticket: 24312 Changelog: Fix issue with jobs running on slurmd's with version 25.05.x or older getting aborted when slurmd re-registers with slurmctld.
This is the Slurm Workload Manager. Slurm is an open-source cluster resource management and job scheduling system that strives to be simple, scalable, portable, fault-tolerant, and interconnect agnostic. Slurm currently has been tested only under Linux.
As a cluster resource manager, Slurm provides three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates conflicting requests for resources by managing a queue of pending work.
The official issue tracker for Slurm is at
: https://support.schedmd.com/
We welcome code contributions and patches, but we do not accept Pull Requests through Github at this time. Please submit patches as attachments to new issues under the "C - Contributions" severity level.
The top-level distribution directory contains this README as well as other high-level documentation files, and the scripts used to configure and build Slurm (see INSTALL). Subdirectories contain the source-code for Slurm as well as a test suite and further documentation. A quick description of the subdirectories of the Slurm distribution follows:
src/ [ Slurm source ]
: Slurm source code is further organized into self explanatory subdirectories such as src/api, src/slurmctld, etc.
doc/ [ Slurm documentation ]
: The documentation directory contains some latex, html, and ascii text papers, READMEs, and guides. Manual pages for the Slurm commands and configuration files are also under the doc/ directory.
etc/ [ Slurm configuration ]
: The etc/ directory contains a sample config file, as well as some scripts useful for running Slurm.
slurm/ [ Slurm include files ]
: This directory contains installed include files, such as slurm.h and slurm_errno.h, needed for compiling against the Slurm API.
testsuite/ [ Slurm test suite ]
: The testsuite directory contains an extensive collection of tests written for Check, Expect and Pytest.
auxdir/ [ autotools directory ]
: Directory for autotools scripts and files used to configure and build Slurm
contribs/ [ helpful tools outside of Slurm proper ]
: Directory for anything that is outside of slurm proper such as a different api or such. To have this build you need to do a make contrib/install-contrib.
Please see the instructions at
: https://slurm.schedmd.com/quickstart_admin.html
Extensive documentation is available from our home page at
: https://slurm.schedmd.com/slurm.html
Slurm is provided "as is" and with no warranty. This software is distributed under the GNU General Public License, please see the files COPYING, DISCLAIMER, and LICENSE.OpenSSL for details.