| <!--#include virtual="header.txt"--> |
| |
| <h1>SLURM Administrator Guide for Sun Constellation systems</h1> |
| |
| <h2>Overview</h2> |
| |
| <p>This document describes the unique features of SLURM on |
| Sun Constellation computers. |
| You should be familiar with the SLURM's mode of operation on Linux clusters |
| before studying the relatively few differences in Sun Constellation system |
| operation described in this document.</p> |
| |
| <p>SLURM's primary mode of operation is designed for use on clusters with |
| nodes configured in a one-dimensional space. |
| Minor changes were required for the <i>smap</i> and <i>sview</i> tools |
| to map nodes in a three-dimensional space. |
| Some changes are also desirable to optimize job placement in three-dimensional |
| space.</p> |
| |
| <h2>Configuration</h2> |
| |
| <p>Two variables must be defined in the <i>config.h</i> file: |
| <i>HAVE_SUN_CONST</i> and <i>HAVE_3D</i>. |
| This can be accomplished in several different ways depending upon how |
| SLURM is being built. |
| <ol> |
| <li>Execute the <i>configure</i> command with the option |
| <i>--enable-sun-const</i> <b>OR</b></li> |
| <li>Execute the <i>rpmbuild</i> command with the option |
| <i>--with sun_const</i> <b>OR</b></li> |
| <li>Add <i>%with_sun_const 1</i> to your <i>~/.rpmmacros</i> file.</li> |
| </ol></p> |
| |
| <p>Node names must have a three-digit suffix describing their |
| zero-origin position in the X-, Y- and Z-dimension respectively (e.g. |
| "tux000" for X=0, Y=0, Z=0; "tux123" for X=1, Y=2, Z=3). |
| Rectangular prisms of nodes can be specified in SLURM commands and |
| configuration files using the system name prefix with the end-points |
| enclosed in square brackets and separated by an "x". |
| For example "tux[620x731]" is used to represent the eight nodes in a |
| block with endpoints at "tux620" and "tux731" (tux620, tux621, tux630, |
| tux631, tux720, tux721, tux730, tux731). |
| While node names of this form are required for SLURM's internal use, |
| it need not be the name returned by the <i>hostlist -s</i> command. |
| See <i>man slurm.conf</i> for details on how to use the <i>NodeName</i>, |
| <i>NodeAddr</i> and <i>NodeHostName</i> configuration parameters |
| for flexibility in this matter.</p> |
| |
| <p>Next you need to select from two options for the resource selection |
| plugin (the <i>SelectType</i> option in SLURM's <i>slurm.conf</i> configuration |
| file): |
| <ol> |
| <li><b>select/cons_res</b> - Performs a best-fit algorithm based upon a |
| one-dimensional space to allocate whole nodes, sockets, or cores to jobs |
| based upon other configuration parameters.</li> |
| <li><b>select/linear</b> - Performs a best-fit algorithm based upon a |
| one-dimensional space to allocate whole nodes to jobs.</li> |
| </ol> |
| |
| <p>In order for <i>select/cons_res</i> or <i>select/linear</i> to |
| allocate resources physically nearby in three-dimensional space, the |
| nodes be specified in SLURM's <i>slurm.conf</i> configuration file in |
| such a fashion that those nearby in <i>slurm.conf</i> (managed |
| internal to SLURM as a one-dimensional space) are also nearby in |
| the physical three-dimensional space. |
| If the definition of the nodes in SLURM's <i>slurm.conf</i> configuration |
| file are listed on one line (e.g. <i>NodeName=tux[000x333]</i>), |
| SLURM will automatically perform that conversion using a |
| <a href="http://en.wikipedia.org/wiki/Hilbert_curve">Hilbert curve</a>. |
| Otherwise you may construct your own node ordering sequence and |
| list them one node per line in <i>slurm.conf</i>. |
| Note that each node must be listed exactly once and consecutive |
| nodes should be nearby in three-dimensional space. |
| Also note that each node must be defined individually rather than using |
| a hostlist expression in order to preserve the ordering (there is no |
| problem using a hostlist expression in the partition specification after |
| the nodes have already been defined). |
| The open source code used by SLURM to generate the Hilbert curve is |
| included in the distribution at <i>contribs/skilling.c</i> in the event |
| that you wish to experiment with it to generate your own node ordering. |
| Two examples of SLURM configuration files are shown below:</p> |
| |
| <pre> |
| # slurm.conf for Sun Constellation system of size 4x4x4 |
| |
| # Configuration parameters removed here |
| |
| # Automatic orders nodes following a Hilbert curve |
| NodeName=DEFAULT Procs=8 RealMemory=2048 State=Unknown |
| NodeName=tux[000x333] |
| PartitionName=debug Nodes=tux[000x333] Default=Yes State=UP |
| </pre> |
| |
| <pre> |
| # slurm.conf for Sun Constellation system of size 2x2x2 |
| |
| # Configuration parameters removed here |
| |
| # Manual ordering of nodes following a space-filling curve |
| NodeName=DEFAULT Procs=8 RealMemory=2048 State=Unknown |
| NodeName=tux000 |
| NodeName=tux100 |
| NodeName=tux110 |
| NodeName=tux010 |
| NodeName=tux011 |
| NodeName=tux111 |
| NodeName=tux101 |
| NodeName=tux001 |
| PartitionName=debug Nodes=tux[000x111] Default=Yes State=UP |
| </pre> |
| |
| <p>In both of the examples above, the node names output by the |
| <i>scontrol show nodes</i> will be ordered as defined (sequentially |
| along the Hilbert curve or per the ordering in the <i>slurm.conf</i> file) |
| rather than in numeric order (e.g. "tux001" follows "tux101" rather |
| than "tux000"). |
| The output of other SLURM commands (e.g. <i>sinfo</i> and <i>squeue</i>) |
| will use a SLURM hostlist expression with the node names numerically ordered). |
| SLURM partitions should contain nodes which are defined sequentially |
| by that ordering for optimal performance.</p> |
| |
| <p class="footer"><a href="#top">top</a></p> |
| |
| <p style="text-align:center;">Last modified 8 January 2009</p></td> |
| |
| <!--#include virtual="footer.txt"--> |