licenses: fix ABBA deadlock on job_ptr->license_list / license_mutex Several licenses paths hold license_mutex and then iterate job_ptr->license_list with a write lock (license_job_test, license_job_return, hres_filter). Meanwhile slurm_bf_licenses_avail iterates job_ptr->license_list first and its callback chain (slurm_bf_hres_filter) later acquires license_mutex. With these iterations as list_for_each() (write lock), two threads can hit the classic ABBA pattern: A holds license_mutex and blocks on the list wrlock; B holds the list wrlock and blocks on license_mutex. Convert all list_for_each() calls on job_ptr->license_list whose callbacks only read the iterated list to list_for_each_ro(): hres_filter_with_list, slurm_bf_hres_filter, license_job_test_with_list, license_job_return_to_list, and slurm_bf_licenses_avail. Multiple readers on the list can coexist, so the cycle no longer forms. The callbacks (_foreach_hres_filter, _foreach_bf_hres_filter, _foreach_license_job_test, _foreach_license_job_return, _foreach_bf_licenses_avail) only modify entry fields or write to other lists, not the iterated job_ptr->license_list structure, so the read lock is sufficient. Issue: 50273
This is the Slurm Workload Manager. Slurm is an open-source cluster resource management and job scheduling system that strives to be simple, scalable, portable, fault-tolerant, and interconnect agnostic. Slurm currently has been tested only under Linux.
As a cluster resource manager, Slurm provides three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates conflicting requests for resources by managing a queue of pending work.
The official issue tracker for Slurm is at
: https://support.schedmd.com/
We welcome code contributions and patches. Please see the contributing guidelines for further details.
The top-level distribution directory contains this README as well as other high-level documentation files, and the scripts used to configure and build Slurm (see INSTALL). Subdirectories contain the source-code for Slurm as well as a test suite and further documentation. A quick description of the subdirectories of the Slurm distribution follows:
src/ [ Slurm source ]
: Slurm source code is further organized into self explanatory subdirectories such as src/api, src/slurmctld, etc.
doc/ [ Slurm documentation ]
: The documentation directory contains some latex, html, and ascii text papers, READMEs, and guides. Manual pages for the Slurm commands and configuration files are also under the doc/ directory.
etc/ [ Slurm configuration ]
: The etc/ directory contains a sample config file, as well as some scripts useful for running Slurm.
slurm/ [ Slurm include files ]
: This directory contains installed include files, such as slurm.h and slurm_errno.h, needed for compiling against the Slurm API.
testsuite/ [ Slurm test suite ]
: The testsuite directory contains an extensive collection of tests written for Check, Expect and Pytest.
auxdir/ [ autotools directory ]
: Directory for autotools scripts and files used to configure and build Slurm
contribs/ [ helpful tools outside of Slurm proper ]
: Directory for anything that is outside of slurm proper such as a different api or such. To have this build you need to do a make contrib/install-contrib.
Please see the instructions at
: https://slurm.schedmd.com/quickstart_admin.html
Extensive documentation is available from our home page at
: https://slurm.schedmd.com/slurm.html
Slurm is provided "as is" and with no warranty. This software is distributed under the GNU General Public License, please see the files COPYING, DISCLAIMER, and LICENSE.OpenSSL for details.