licenses: fix ABBA deadlock on job_ptr->license_list / license_mutex

Several licenses paths hold license_mutex and then iterate
job_ptr->license_list with a write lock (license_job_test,
license_job_return, hres_filter). Meanwhile slurm_bf_licenses_avail
iterates job_ptr->license_list first and its callback chain
(slurm_bf_hres_filter) later acquires license_mutex. With these
iterations as list_for_each() (write lock), two threads can hit the
classic ABBA pattern: A holds license_mutex and blocks on the list
wrlock; B holds the list wrlock and blocks on license_mutex.

Convert all list_for_each() calls on job_ptr->license_list whose
callbacks only read the iterated list to list_for_each_ro():
hres_filter_with_list, slurm_bf_hres_filter, license_job_test_with_list,
license_job_return_to_list, and slurm_bf_licenses_avail. Multiple
readers on the list can coexist, so the cycle no longer forms.

The callbacks (_foreach_hres_filter, _foreach_bf_hres_filter,
_foreach_license_job_test, _foreach_license_job_return,
_foreach_bf_licenses_avail) only modify entry fields or write to
other lists, not the iterated job_ptr->license_list structure, so
the read lock is sufficient.

Issue: 50273
1 file changed
tree: b88151c260bafbccca0741799c1f01321a5981aa
  1. auxdir/
  2. CHANGELOG/
  3. contribs/
  4. debian/
  5. doc/
  6. etc/
  7. slurm/
  8. src/
  9. testsuite/
  10. tools/
  11. .clang-format
  12. .git-blame-ignore-revs
  13. .gitignore
  14. .pre-commit-config.yaml
  15. aclocal.m4
  16. AUTHORS
  17. CHANGELOG.md
  18. CODE_OF_CONDUCT.md
  19. CODEOWNERS
  20. config.h.in
  21. configure
  22. configure.ac
  23. CONTRIBUTING.md
  24. COPYING
  25. DISCLAIMER
  26. INSTALL
  27. LICENSE.OpenSSL
  28. make_ref.include
  29. Makefile.am
  30. Makefile.in
  31. META
  32. README.md
  33. RELEASE_NOTES.md
  34. SECURITY.md
  35. slurm.spec
README.md

Slurm Workload Manager

This is the Slurm Workload Manager. Slurm is an open-source cluster resource management and job scheduling system that strives to be simple, scalable, portable, fault-tolerant, and interconnect agnostic. Slurm currently has been tested only under Linux.

As a cluster resource manager, Slurm provides three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates conflicting requests for resources by managing a queue of pending work.

NOTES FOR GITHUB DEVELOPERS

The official issue tracker for Slurm is at

: https://support.schedmd.com/

We welcome code contributions and patches. Please see the contributing guidelines for further details.

SOURCE DISTRIBUTION HIERARCHY

The top-level distribution directory contains this README as well as other high-level documentation files, and the scripts used to configure and build Slurm (see INSTALL). Subdirectories contain the source-code for Slurm as well as a test suite and further documentation. A quick description of the subdirectories of the Slurm distribution follows:

src/ [ Slurm source ]

: Slurm source code is further organized into self explanatory subdirectories such as src/api, src/slurmctld, etc.

doc/ [ Slurm documentation ]

: The documentation directory contains some latex, html, and ascii text papers, READMEs, and guides. Manual pages for the Slurm commands and configuration files are also under the doc/ directory.

etc/ [ Slurm configuration ]

: The etc/ directory contains a sample config file, as well as some scripts useful for running Slurm.

slurm/ [ Slurm include files ]

: This directory contains installed include files, such as slurm.h and slurm_errno.h, needed for compiling against the Slurm API.

testsuite/ [ Slurm test suite ]

: The testsuite directory contains an extensive collection of tests written for Check, Expect and Pytest.

auxdir/ [ autotools directory ]

: Directory for autotools scripts and files used to configure and build Slurm

contribs/ [ helpful tools outside of Slurm proper ]

: Directory for anything that is outside of slurm proper such as a different api or such. To have this build you need to do a make contrib/install-contrib.

COMPILING AND INSTALLING THE DISTRIBUTION

Please see the instructions at

: https://slurm.schedmd.com/quickstart_admin.html

Extensive documentation is available from our home page at

: https://slurm.schedmd.com/slurm.html

LEGAL

Slurm is provided "as is" and with no warranty. This software is distributed under the GNU General Public License, please see the files COPYING, DISCLAIMER, and LICENSE.OpenSSL for details.