blob: 54a1e23e4c2c4f4cc668b875c0eb8059ec3cd3e0 [file] [log] [blame]
<!--#include virtual="header.txt"-->
<h1>REST API Details</h1>
<p>Slurm provides a <a href="https://restfulapi.net/">REST API</a> through the
slurmrestd daemon, using <a href="jwt.html">JSON Web Tokens</a> for
authentication. This daemon is designed to allow clients to communicate with
Slurm via a REST API (in addition to the command line interface (CLI) or C API).
</p>
<p>See also:
<ul>
<li><a href="rest_quickstart.html">REST API Quick Start Guide</a>
<ul>
<li><a href="rest_quickstart.html#common_issues">Common Issues</a></li>
</ul></li>
<li><a href="rest_api.html">REST API Methods and Models</a></li>
<li><a href="slurmrestd.html">slurmrestd man page</a></li>
<li><a href="openapi_release_notes.html">OpenAPI Plugin Release Notes</a></li>
<li><a href="rest_clients.html">REST API Client Guide</a></li>
</ul>
</p>
<h2 id="contents">Contents<a class="slurm_link" href="#contents"></a></h2>
<ul>
<li><a href="#stateless">Stateless</a></li>
<li><a href="#run_modes">Run modes</a>
<ul>
<li><a href="#inet">Inet Service Mode</a>
<li><a href="#listen">Listening Mode</a>
</ul>
</li>
<li><a href="#config">Configuration</a></li>
<li><a href="#plugins">Plugins</a></li>
<li><a href="#high_availability">High Availability</a></li>
<li><a href="#security">Security</a>
<ul>
<li><a href="#jwt">JSON Web Token (JWT) Authentication</a></li>
<li><a href="#local_auth">Local Authentication</a></li>
<li><a href="#auth_proxy">Authenticating Proxy</a></li>
</ul>
<li><a href="#python-guide">Python Guide</a>
<ul>
<li><a href="#python-setup">Setup</a></li>
<li><a href="#python-usage-overview">Usage Overview</a></li>
<li><a href="#python-job-submission">Job Submission</a></li>
<li><a href="#python-entity-control">Job, Node, and Reservation Control</a></li>
<li><a href="#python-system-management">System Management</a></li>
</ul>
</li>
</li>
</ul>
<h2 id="stateless">Stateless<a class="slurm_link" href="#stateless"></a></h2>
<p>Slurmrestd is stateless as it does not cache or save any state between
requests. Each request is handled in a thread and then all of that state is
discarded. Any request to slurmrestd is completely synchronous with the
Slurm controller (slurmctld or slurmdbd) and is only considered complete once
the HTTP response code has been sent to the client. Slurmrestd will hold a
client connection open while processing a request. Slurm database commands are
committed at the end of every request, on the success of all API calls in the
request.</p>
<p>Sites are strongly encouraged to setup a caching proxy between slurmrestd
and clients to avoid having clients repeatedly call queries, causing usage to
be higher than needed (and causing lock contention) on the controller.</p>
<h2 id="run_modes">Run modes<a class="slurm_link" href="#run_modes"></a></h2>
<p>Slurmrestd currently supports two run modes: inet service mode and listening
mode.</p>
<h3 id="inet">Inet Service Mode<a class="slurm_link" href="#inet"></a></h3>
<p>The Slurmrestd daemon acts as an
<a href="https://en.wikipedia.org/wiki/Inetd">
Inet Service
</a> treating STDIN and STDOUT as the client. This mode allows clients to use
inetd, xinetd, or systemd socket activated services and avoid the need to run a
daemon on the host at all times. This mode creates an instance for each client
and does not support reusing the same instance for different clients.</p>
<h3 id="listen">Listening Mode<a class="slurm_link" href="#listen"></a></h3>
<p>The Slurmrestd daemon acts as a full UNIX service and continuously listens
for new TCP connections. Each connection and request are independently
authenticated.</p>
<h2 id="config">Configuration<a class="slurm_link" href="#config"></a></h2>
<p>slurmrestd can be configured either by environment variables or command line
arguments. Please see the <b>doc/man/man1/slurmrestd.8</b> man page and
<a href="rest_quickstart.html#customization">REST API Quick Start Guide</a>
for details.</p>
<h2 id="plugins">Plugins<a class="slurm_link" href="#plugins"></a></h2>
<p>As of Slurm 20.11, the REST API uses plugins for authentication and
generating content. As of Slurm-21.08, the OpenAPI plugins are available
outside of slurmrestd daemon and other slurm commands may provide or accept the
latest version of the OpenAPI formatted output. This functionality is provided
on a per command basis. Please refer to the
<a href="rest_clients.html#data_parser_lifecycle">Data Parser Lifecycle</a>
documentation for the planned life cycles of versioned endpoints.
These plugins can be optionally listed or selected via command line arguments
as described in the <a href="slurmrestd.html">slurmrestd</a> documentation.</p>
<h2 id="high_availability">High Availability
<a class="slurm_link" href="#high_availability"></a></h2>
<p>Slurmrestd is agnostic to its deployment in a highly available cluster.
The daemon may be run on multiple nodes but does not provide any coordination
with other instances for load balancing or failover.
If such functionality is desired, a separate load balancer may be deployed.
The load balancer should be able to forward any required authentication
information on to the slurmrestd machines (see <a href="#security">Security</a>
section).</p>
<p>The number of connections allowed by the slurmrestd system(s) should also be
limited so that the slurmctld is not overwhelmed with requests. Pay attention to
the <code>-t &lt;THREAD COUNT&gt;</code> and
<code>--max-connections &lt;count&gt;</code> options to <b>slurmrestd</b>, the
number of nodes deployed, and the specs of the machine running <b>slurmctld</b>.
</p>
<h2 id="security">Security<a class="slurm_link" href="#security"></a></h2>
<p>The Slurm REST API is written to provide the necessary functionality for
clients to control Slurm using REST commands. It is <b>not</b> designed to be
directly internet facing. Only unencrypted and uncompressed HTTP communications
are supported. Slurmrestd also has no protection against man in the middle or
replay attacks. Slurmrestd should only be placed in a trusted network that will
communicate with a trusted client.</p>
<p>Any site wishing to expose Slurm REST API to the internet or outside of the
cluster should at the very least use a proxy to wrap all communications with
TLS v1.3 (or later). You should also add monitoring to reject any client who
repeatedly attempts invalid logins at either the network perimeter firewall or
at the TLS proxy. Any client filtering that can be done via a proxy is
suggested to avoid common internet crawlers from talking to slurmrestd and
wasting system resource or even causing higher latency for valid clients.
Sites are recommended to use shorter lived JWT tokens for clients and renew
often, possibly via non-Slurm JWT generator to avoid having to enforce JWT
lifespan limits. It is also suggested that sites use an authenticating proxy
to handle all client authentication against the sites preferred Single Sign
On (SSO) provider instead of Slurm <b>scontrol</b> generated tokens. This will
prevent any unauthenticated client from connecting to slurmrestd.</p>
<p>The Slurm REST API is an HTTP server and all general possible precautions
for security of any web server should be applied. As these precautions are site
specific, it is highly recommended that you work with your site's security
group to ensure all policies are enforced at the proxy before connecting to
slurmrestd.</p>
<p>Slurm tries not to give potential attackers any hints when there are
authentication failures. This results in the client getting this rather terse
message: <code>Authentication failure</code>. When this happens, take a look at
the logs for the relevant Slurm daemon (i.e. <b>slurmdbd</b>, <b>slurmctld</b>,
or <b>slurmd</b>) for information about the actual issue.</p>
<h3 id="jwt">JSON Web Token (JWT) Authentication
<a class="slurm_link" href="#jwt"></a>
</h3>
<p>slurmrestd supports using <a href=jwt.html>JWT to authenticate users</a>.
JWT can be used to authenticate user over REST protocol.
<ul>
<li>User Name Header: X-SLURM-USER-NAME</li>
<li>JWT Header: X-SLURM-USER-TOKEN</li>
</ul>
SlurmUser or root can provide alternative user names to act as a proxy for the
given user. While using JWT authentication, slurmrestd should be run as a
unique, <b>unprivileged</b> user and group. Slurmrestd should be provided an
invalid SLURM_JWT environment variable at startup to activate JWT authentication.
This will allow users to provide their own JWT tokens while authenticating to
the proxy and ensuring against any possible accidental authorizations.</p>
<p>When using JWT, it is important that <u>AuthAltTypes=auth/jwt</u> be
configured in both your slurm.conf and slurmdbd.conf for slurmrestd.</p>
<h3 id="local_auth">Local Authentication
<a class="slurm_link" href="#local_auth"></a>
</h3>
<p>slurmrestd supports using UNIX domain sockets to have the kernel
authenticate local users. By default, slurmrestd will not start as root or
SlurmUser or if the user's primary group belongs to root or SlurmUser.
Slurmrestd must be located in the Munge security domain in order to function
and communicate with Slurm in local authentication mode.
</p>
<h3 id="auth_proxy">Authenticating Proxy
<a class="slurm_link" href="#auth_proxy"></a>
</h3>
<p>There is a wide array of authentication systems that a site could choose
from, if using <a href="#jwt">JWT authentication</a> doesn't meet your
requirements. An authenticating proxy is setup with a JWT token assigned to
the SlurmUser that can then be used to proxy for any user on the cluster.
This ability is only allowed for SlurmUser and the root users, all other
tokens will only work with their locally assigned users.</p>
<p>If using a third-party authenticating proxy, it is expected that it will
provide the correct HTTP headers (<b>X-SLURM-USER-NAME</b> and
<b>X-SLURM-USER-TOKEN</b>) to slurmrestd along with the user's request.</p>
<p>Slurm places no requirements on the authenticating proxy beyond its being
HTTP 1.1 compliant and that it provides the correct HTTP headers to allow
client authentication. Slurm will explicitly trust the HTTP headers provided
and has no way to verify them (beyond the proxy's trusted token
<b>X-SLURM-USER-TOKEN</b>). Any authenticating proxy will need to follow
your site's security policies and ensure that the proxied requests come from
the correct user. These requirements are standard to any authenticated
proxy and are not Slurm specific.</p>
<p>A working trivial example can be found in an <a
href="https://gitlab.com/SchedMD/training/docker-scale-out/-/tree/master/proxy">
internal tool</a> used for testing and training. It uses
<a href="https://www.php.net/">PHP</a> and
<a href="https://www.nginx.com/">NGINX</a> to provide the authentication logic.
This example should only be used as a basic starting place as it is not suitable
for deployment in a production environment.</p>
<h2 id="python-guide">Python Guide
<a class="slurm_link" href="#python-guide"></a>
</h2>
<p>
OpenAPI tools can be used to generate a Python client to interact with the REST
API. The examples below are for version 0.0.43 of the API, so there will be some
differences with other versions.
</p>
<h3 id="python-setup">Setup
<a class="slurm_link" href="#python-setup"></a>
</h3>
<ol>
<li>Install <a href="https://openapi-generator.tech/docs/installation/">
openapi-generator-cli</a></li>
<li>Compile the client library:
<pre>
slurmrestd --generate-openapi-spec &gt; openapi.json
openapi-generator-cli generate -i openapi.json -g python -o py_api_client
</pre>
</li>
<li>(Optional, though recommended) Initialize and activate a Python virtual
environment.</li>
<li>Install the required packages:
<pre>
cd py_api_client/
pip install -r requirements.txt
</pre>
</li>
<li>Set up the Python script. These initial lines should be used for all
subsequent examples, and assumes you have the 'SLURM_JWT' environment
variable set to a valid token:
<pre>
import os
import time
from openapi_client import SlurmApi
from openapi_client import SlurmdbApi
from openapi_client import ApiClient as Client
from openapi_client import Configuration as Config
c = Config()
c.host = "http://localhost:8080/"
c.access_token = os.getenv("SLURM_JWT")
if not c.access_token:
raise KeyError("No SLURM_JWT set")
slurm = SlurmApi(Client(c))
slurmdb = SlurmdbApi(Client(c))
# Location of 'srun' binary + other relevant binaries in your slurm scripts
environment=['PATH=/bin/:/sbin/:/home/slurm/bin/:/home/slurm/sbin/']
curr_dir = '/tmp'
</pre>
</li>
</ol>
<h3 id="python-usage-overview">Usage Overview
<a class="slurm_link" href="#python-usage-overview"></a>
</h3>
<p>
Once set up, you can use the <code>openapi_client</code> module to access
classes and functions corresponding to the models and methods in the REST API.
See below for examples and note the following naming conventions for converting
between the REST API and the Python client:
</p>
<ul>
<li>API model: <code>v0.0.43_job_desc_msg</code>
<br>Corresponding Python class: <code>V0043JobDescMsg</code>
</li>
<li>API method: <code>POST /slurm/v0.0.43/job/submit</code>
<br>Corresponding Python function: <code>slurm_v0043_post_job_submit()</code>
</li>
</ul>
<p>
If you encounter any errors, check the common issues on the
<a href="rest_quickstart.html#common_issues">REST Quickstart</a> page.
</p>
<h3 id="python-job-submission">Job Submission
<a class="slurm_link" href="#python-job-submission"></a>
</h3>
<p>
This example shows how to populate a job submit request and
job description message with desired submission parameters. It also
illustrates how to send a POST request to submit the job.
</p>
<pre>
from openapi_client import V0043JobSubmitReq
from openapi_client import V0043JobDescMsg
# Populate a job submit request and job description message with desired parameters
my_job = V0043JobSubmitReq(script='#!/bin/bash\nsrun sleep 300',
job=V0043JobDescMsg(
name='rest_test',
partition='gpu',
tres_per_job='gres:gpu:amd:4',
time_limit={"set": True, "number": 5},
required_nodes=["n2", "n4"],
tasks=5,
environment=environment,
current_working_directory=curr_dir
)
)
# Send POST request to submit the job
submit_response = slurm.slurm_v0043_post_job_submit(my_job)
</pre>
<h3 id="python-entity-control">Job, Node, and Reservation Control
<a class="slurm_link" href="#python-entity-control"></a>
</h3>
<p>
Jobs, nodes, and reservations can be managed through the Python client in
similar ways. Each entity requires its own imports, and each has similar
functions for viewing, modifying, and deleting. The GET functions and some of
the POST/DELETE functions can also be used in the <b>plural</b> form, for
example <code>slurm_v0043_get_jobs()</code>, to affect more than one entity.
The relevant imports and functions are listed below.
</p>
<ul>
<li><b>Job Control</b>
<ul>
<li>Imports: <code>V0043JobSubmitReq</code>,
<code>V0043JobDescMsg</code></li>
<li>View: <code>slurm_v0043_get_job()</code>
<li>Add (submit): <code>slurm_v0043_post_job_submit()</code></li>
<li>Modify: <code>slurm_v0043_post_job()</code></li>
<li>Delete (cancel): <code>slurm_v0043_delete_job()</code>
</ul>
</li>
<li><b>Node Control</b>
<ul>
<li>Imports: <code>V0043UpdateNodeMsg</code></li>
<li>View: <code>slurm_v0043_get_node()</code>
<li>Add (create): <b>N/A</b></li>
<li>Modify: <code>slurm_v0043_post_node()</code>
<li>Delete: <code>slurm_v0043_delete_node()</code></li>
</ul>
</li>
<li><b>Reservation Control</b>
<ul>
<li>Imports: <code>V0043ReservationDescMsg</code></li>
<li>View: <code>slurm_v0043_get_reservation()</code>
<li>Add (create): <code>slurm_v0043_post_reservation()</code>
<li>Modify: <code>slurm_v0043_post_reservation()</code>
<li>Delete: <code>slurm_v0043_delete_reservation()</code></li>
</ul>
</li>
</ul>
<p>
Here is an example for viewing, deleting, adding, and modifying reservations:
</p>
<pre>
from openapi_client import V0043ReservationDescMsg
# GET request to query reservations
resp = slurm.slurm_v0043_get_reservations()
# Examine output of GET request
if "important_jobs" in [resv.name for resv in resp.reservations]:
resp = slurm.slurm_v0043_delete_reservation("important_jobs")
# POST request to create a reservation with the desired parameters and flags
slurm.slurm_v0043_post_reservation(
V0043ReservationDescMsg(
name="important_jobs",
duration={"set": True, "number": 15},
node_list=["n4", "n5"],
start_time={"set": True, "number": int(time.time())},
users=["slurm"],
flags=["IGNORE_JOBS", "MAGNETIC", "DAILY"],
)
)
# POST request to modify the reservation
slurm.slurm_v0043_post_reservation(
V0043ReservationDescMsg(
name="important_jobs",
duration={"set": True, "number": 20},
)
)
</pre>
<h3 id="python-system-management">System Management
<a class="slurm_link" href="#python-system-management"></a>
</h3>
<p>
A system reconfigure can be initiated with the function
<code>slurm.slurm_v0043_get_reconfigure()</code>. System information can also be
viewed with the following API functions:
</p>
<ul>
<li><code>slurm.slurm_v0043_get_partitions()</code></li>
<li><code>slurm.slurm_v0043_get_diag()</code></li>
<li><code>slurm.slurm_v0043_get_licenses()</code></li>
</ul>
<p>Here is an example for viewing partition info:</p>
<pre>
# GET request to query partitions
resp = slurm.slurm_v0043_get_partitions()
# Examine request output to filter on a specific partition QOS
qos_parts = [part for part in resp.partitions if 'sample' == part.qos.assigned]
# GET request to query partitions with a specific name
defq = slurm.slurm_v0043_get_partition("defq")
# Examine request output to grab the nodes on a partition
configured_nodes = defq.partitions[0].nodes.configured
</pre>
<hr size=4 width="100%">
<!--#include virtual="footer.txt"-->