| <!--#include virtual="header.txt"--> |
| |
| <h1>Multi-Cluster Operation</h1> |
| |
| <p>A cluster is comprised of all the nodes managed by a single slurmctld |
| daemon. Slurm offers the ability to target commands to other |
| clusters instead of, or in addition to, the local cluster on which the |
| command is invoked. When this behavior is enabled, users can submit |
| jobs to one or many clusters and receive status from those remote |
| clusters.</p> |
| |
| <p>For example:</p> |
| |
| <PRE> |
| juser@dawn> squeue -M dawn,dusk |
| CLUSTER: dawn |
| JOBID PARTITION NAME USER ST TIME NODES BP_LIST(REASON) |
| 76897 pdebug myJob juser R 4:10 128 dawn001[8-15] |
| 76898 pdebug myJob juser R 4:10 128 dawn001[16-23] |
| 16899 pdebug myJob juser R 4:10 128 dawn001[24-31] |
| |
| CLUSTER: dusk |
| JOBID PARTITION NAME USER ST TIME NODES BP_LIST(REASON) |
| 11950 pdebug aJob juser R 4:20 128 dusk000[0-15] |
| 11949 pdebug aJob juser R 5:01 128 dusk000[48-63] |
| 11946 pdebug aJob juser R 6:35 128 dusk000[32-47] |
| 11945 pdebug aJob juser R 6:36 128 dusk000[16-31] |
| </PRE> |
| |
| <p>Most of the Slurm client commands offer the "-M, --clusters=" |
| option which provides the ability to communicate to and from a comma |
| separated list of clusters.</p> |
| |
| <p>When <b>sbatch</b>, <b>salloc</b> or <b>srun</b> is invoked with a cluster |
| list, Slurm will immediately submit the job to the cluster that offers the |
| earliest start time subject its queue of pending and running jobs. Slurm will |
| make no subsequent effort to migrate the job to a different cluster (from the |
| list) whose resources become available when running jobs finish before their |
| scheduled end times.</p> |
| |
| <p><b>NOTE</b>: In order for <b>salloc</b> or <b>srun</b> to work with the "-M, |
| --clusters" option in a multi-cluster environment, the compute nodes must be |
| accessible to and from the submission host.</p> |
| |
| <h2 id="multi_cluster">Multi-Cluster Configuration |
| <a class="slurm_link" href="#multi_cluster"></a> |
| </h2> |
| <p>The multi-cluster functionality requires the use of the SlurmDBD. |
| The AccountingStorageType in the slurm.conf file must be set to the |
| accounting_storage/slurmdbd plugin and the MUNGE or authentication |
| keys must be installed to allow each cluster to communicate with the |
| SlurmDBD. Note that MUNGE can be configured to use different keys for |
| communications within a cluster and across clusters if desired. |
| See <a href="accounting.html">accounting</a> for details.</p> |
| |
| <p>Once configured, Slurm commands specifying the "-M, --clusters=" |
| option will become active for all of the clusters listed by the |
| <b>"sacctmgr show clusters"</b> command.</p> |
| |
| <p> |
| See also the <a href="federation.html">Slurm Federated Scheduling Guide<a/>. |
| </p> |
| |
| <p style="text-align:center;">Last modified 9 June 2021</p> |
| |
| <!--#include virtual="footer.txt"--> |