| <!--#include virtual="header.txt"--> |
| |
| <h1><a id="top">TLS Certificate Manager</a></h1> |
| |
| <h2 id="Overview">Overview<a class="slurm_link" href="#Overview"></a></h2> |
| |
| <p> |
| The <code>certmgr</code> plugin interface can be used alongside the |
| <code>tls</code> plugin interface to dynamically create and renew signed |
| certificates for slurmd/sackd nodes. |
| </p> |
| |
| <p> |
| Signed certificates and accompanying private keys generated with certmgr are |
| saved in slurmd's spool directory when they are retrieved from slurmctld, and |
| loaded when slurmd starts up. |
| </p> |
| |
| <h2 id="script">certmgr/script<a class="slurm_link" href="#script"></a></h2> |
| |
| <p> |
| The <code>certmgr/script</code> plugin allows scripts to be used to perform the |
| necessary operations needed to validate node identity and generate signed |
| certificates. |
| </p> |
| |
| <h3 id="script_examples">OpenSSL Example<a class="slurm_link" href="#script_examples"></a></h3> |
| |
| <p> |
| This is an example using the openssl cli to generate certificate signing |
| requests and to sign such requests to create signed certificates. This example |
| is not meant to be used in production, and is only mean to show the intended |
| responsibilities of each script. |
| </p> |
| |
| <p> |
| In this example, there are a list of things that need to be preloaded on each |
| machine before Slurm can do its certificate management. Note that any |
| instructions for slurmd also apply to sackd nodes. |
| </p> |
| |
| <p> |
| slurmctld will need access to the CA certificate, and the CA certificate/key |
| pair must be owned by <code>SlurmUser</code> (this is NOT recommended in a |
| production setting). See the <a href="tls.html#s2n_openssl_example">TLS</a> |
| page for more info on how to generate this certificate/key pair. |
| </p> |
| |
| <p> |
| The following scripts need to be created and configured. |
| See <a href="slurm.conf.html#OPT_CertmgrParameters">CertmgrParameters</a> for |
| more details on each script. |
| </p> |
| |
| <ul> |
| <li><code>get_node_token_script</code></li> |
| <li><code>generate_csr_script</code></li> |
| <li><code>validate_node_script</code></li> |
| <li><code>sign_csr_script</code></li> |
| </ul> |
| |
| <p> |
| slurmctld needs to be able to validate slurmd's certificate signing request. |
| This is done via unique tokens that are retrieved on slurmd nodes using |
| <code>get_node_token_script</code>, and validated on the slurmctld host using |
| <code>validate_node_script</code>. |
| </p> |
| |
| <p> |
| A unique token will to be generated for each slurmd. Each token will be stored |
| on its respective slurmd host, as well as in a comprehensive list that contains |
| all node tokens on the slurmctld host. This token will be sent from slurmd to |
| slurmctld along with the certificate signing request that slurmd will generate |
| at runtime, and be validated by slurmctld before slurmctld creates a signed |
| certificate. Note that slurmd will not begin to process any RPCs until a signed |
| certificate is loaded. |
| </p> |
| |
| This is a simple example of how these tokens can be generated and stored: |
| <pre> |
| # generate base64 32 character random token |
| base64 /dev/urandom | head -c 32 > ${NODENAME}_token.txt |
| |
| # add token to token list |
| echo "`cat ${NODENAME}_token.txt`" >> node_token_list.txt |
| </pre> |
| |
| <p> |
| Node <b>n1</b> needs to boot up with <code>n1_token.txt</code> and/or have it |
| securely transferred to it. <b>slurmctld</b> needs to have secure access to |
| <code>node_token_list.txt</code> in order to validate node tokens with the |
| <code>validate_node_script</code>. |
| </p> |
| |
| <p> |
| The <code>get_node_token_script</code>, <code>generate_csr_script</code>, and |
| <code>get_node_cert_key_script</code> paths need to point to scripts that exist |
| and are executable on slurmd nodes. |
| </p> |
| |
| <h4 id="get_node_token_script_example"> |
| get_node_token_script example:<a class="slurm_link" href="#get_node_token_script_example"></a> |
| </h4> |
| |
| <p> |
| Print token to stdout. Return zero exit code for success, and non-zero exit |
| code for error. |
| </p> |
| |
| <pre> |
| #!/bin/bash |
| |
| # Slurm node name is passed in as arg $1 |
| TOKEN_PATH=/etc/slurm/certmgr/$1_token.txt |
| |
| # Check if token file exists |
| if [ ! -f $TOKEN_PATH ] |
| then |
| echo "$BASH_SOURCE: Failed to resolve token path '$TOKEN_PATH'" |
| exit 1 |
| fi |
| |
| # Print token to stdout |
| cat $TOKEN_PATH |
| |
| # Exit with exit code 0 to indicate success |
| exit 0 |
| </pre> |
| |
| <h4 id="generate_csr_script_example"> |
| generate_csr_script example:<a class="slurm_link" href="#generate_csr_script_example"></a> |
| </h4> |
| |
| <p> |
| Print certificate signing request to stdout. Return zero exit code for success, |
| and non-zero exit code for error. |
| </p> |
| |
| <pre> |
| #!/bin/bash |
| |
| # Slurm node name is passed in as arg $1 |
| NODE_PRIVATE_KEY=/etc/slurm/certmgr/$1_private_key.pem |
| |
| # Check if node private key file exists |
| if [ ! -f $NODE_PRIVATE_KEY ] |
| then |
| echo "$BASH_SOURCE: Failed to resolve node private key path '$NODE_PRIVATE_KEY'" |
| exit 1 |
| fi |
| |
| # Generate CSR using node private key and print CSR to stdout |
| openssl req -new -key $NODE_PRIVATE_KEY \ |
| -subj "/C=XX/ST=StateName/L=CityName/O=CompanyName/OU=CompanySectionName/CN=$1" |
| |
| # Check exit code from openssl |
| if [ $? -ne 0 ] |
| then |
| echo "$BASH_SOURCE: Failed to generate CSR" |
| exit 1 |
| fi |
| |
| # Exit with exit code 0 to indicate success |
| exit 0 |
| </pre> |
| |
| <h4 id="get_node_cert_key_script"> |
| get_node_cert_key_script example:<a class="slurm_link" href="#get_node_cert_key_script"></a> |
| </h4> |
| |
| <p> |
| Print private key used to generate CSR to stdout. Return zero exit code for |
| success, and non-zero exit code for error. |
| </p> |
| |
| <pre> |
| #!/bin/bash |
| |
| # Slurm node name is passed in as arg $1 |
| NODE_PRIVATE_KEY=/etc/slurm/certmgr/$1_cert_key.pem |
| |
| # Check if node private key file exists |
| if [ ! -f $NODE_PRIVATE_KEY ] |
| then |
| echo "$BASH_SOURCE: Failed to resolve node private key path '$NODE_PRIVATE_KEY'" |
| exit 1 |
| fi |
| |
| cat $NODE_PRIVATE_KEY |
| |
| # Exit with exit code 0 to indicate success |
| exit 0 |
| </pre> |
| |
| <p> |
| The <code>validate_node_script</code> and <code>sign_csr_script</code> paths |
| need to point to scripts that exist and are executable on <b>slurmctld</b>. |
| </p> |
| |
| <h4 id="validate_node_script_example"> |
| validate_node_script example:<a class="slurm_link" href="#validate_node_script_example"></a> |
| </h4> |
| |
| <p> |
| Return zero exit code for valid node tokens, and non-zero exit code for |
| invalid node tokens or other errors. |
| </p> |
| |
| <pre> |
| #!/bin/bash |
| |
| # Node's unique token is passed in as arg $1 |
| NODE_TOKEN=$1 |
| NODE_TOKEN_LIST_FILE=/etc/slurm/certmgr/node_token_list.txt |
| |
| # Check if node token list file exists |
| if [ ! -f $NODE_TOKEN_LIST ] |
| then |
| echo "$BASH_SOURCE: Failed to resolve node token list path '$NODE_TOKEN_LIST'" |
| exit 1 |
| fi |
| |
| # Check if unique node token is in token list file |
| grep $1 $NODE_TOKEN_LIST_FILE |
| |
| # Check exit code from grep to see if token was found |
| if [ $? -ne 0 ] |
| then |
| echo "$BASH_SOURCE: Failed to validate token '$NODE_TOKEN'" |
| exit 1 |
| fi |
| |
| # Exit with exit code 0 to indicate success (node token is valid) |
| exit 0 |
| </pre> |
| |
| <h4 id="sign_csr_script_example"> |
| sign_csr_script example:<a class="slurm_link" href="#sign_csr_script_example"></a> |
| </h4> |
| |
| <p> |
| Print signed certificate to stdout. Return zero exit code for success, and |
| non-zero exit code for error. |
| </p> |
| |
| <pre> |
| #!/bin/bash |
| |
| # Certificate signing request is passed in as arg $1 |
| CSR=$1 |
| CA_CERT=/etc/slurm/certmgr/root_cert.pem |
| CA_KEY=/etc/slurm/certmgr/root_key.pem |
| |
| # Check if CA certificate file exists |
| if [ ! -f $CA_CERT ] |
| then |
| echo "$BASH_SOURCE: Failed to resolve CA certificate path '$CA_CERT'" |
| exit 1 |
| fi |
| |
| # Check CA private key permissions |
| if [ `stat -c "%a" $CA_KEY` -ne $KEY_PERMISSIONS ] |
| then |
| echo "$BASH_SOURCE: Bad permissions for CA private key at '$CA_KEY'. Permissions should be $KEY_PERMISSIONS" |
| exit 1 |
| fi |
| |
| # Sign CSR using CA certificate and CA private key and print signed cert to stdout |
| openssl x509 -req -CA $CA_CERT -CAkey $CA_KEY 2>/dev/null <<< $CSR |
| |
| # Check exit code from openssl |
| if [ $? -ne 0 ] |
| then |
| echo "$BASH_SOURCE: Failed to generate signed certificate" |
| exit 1 |
| fi |
| |
| # Exit with exit code 0 to indicate success |
| exit 0 |
| </pre> |
| |
| <p> |
| If everything is configured correctly, the following lines should appear in the |
| slurmd and slurmctld logs with the |
| <a href="slurm.conf.html#OPT_TLS">DebugFlags=TLS</a> setting. |
| </p> |
| |
| <p>slurmd:</p> |
| <pre> |
| slurmd: certmgr/script: certmgr_p_get_node_token: TLS: Successfully retrieved unique node token |
| slurmd: certmgr/script: certmgr_p_generate_csr: TLS: Successfully generated csr: |
| -----BEGIN CERTIFICATE REQUEST----- |
| . . . |
| -----END CERTIFICATE REQUEST----- |
| </pre> |
| |
| <p>slurmctld:</p> |
| <pre> |
| slurmctld: certmgr/script: certmgr_p_sign_csr: TLS: Successfully validated node token |
| slurmctld: certmgr/script: certmgr_p_sign_csr: TLS: Successfully generated signed certificate: |
| -----BEGIN CERTIFICATE----- |
| . . . |
| -----END CERTIFICATE----- |
| </pre> |
| |
| <p>slurmd:</p> |
| <pre> |
| slurmd: TLS: Successfully got signed certificate from slurmctld: |
| -----BEGIN CERTIFICATE----- |
| . . . |
| -----END CERTIFICATE----- |
| </pre> |
| |
| <p> |
| <a href="slurm.conf.html#OPT_AuditTLS">DebugFlags=AuditTLS</a> can also be used |
| to show less verbose logs of certificate renewal. |
| </p> |
| |
| <p style="text-align:center;">Last modified 06 July 2025</p> |
| |
| <!--#include virtual="footer.txt"--> |