blob: 1a0df209d59eeb7e1ae62cc7d4cba9bd7ad72454 [file] [log] [blame] [edit]
<!--#include virtual="header.txt"-->
<h1>SLURM: A Highly Scalable Resource Manager</h1>
<p>SLURM is an open-source resource manager designed for Linux clusters of
all sizes.
It provides three key functions.
First it allocates exclusive and/or non-exclusive access to resources
(computer nodes) to users for some duration of time so they can perform work.
Second, it provides a framework for starting, executing, and monitoring work
(typically a parallel job) on a set of allocated nodes.
Finally, it arbitrates contention for resources by managing a queue of
pending work. </p>
<p>SLURM's design is very modular with dozens of optional plugins.
In its simplest configuration, it can be installed and configured in a
couple of minutes (see <a href="http://www.linux-mag.com/id/7239/1/">
Caos NSA and Perceus: All-in-one Cluster Software Stack</a>
by Jeffrey B. Layton) and is used by
<a href="http://www.intel.com/">Intel</a> on their 48-core
<a href="http://www.hpcwire.com/features/Intel-Unveils-48-Core-Research-Chip-78378487.html">
"cluster on a chip"</a>.
More complex configurations can satisfy the job scheduling needs of
world-class computer centers and rely upon a
<a href="http://www.mysql.com/">MySQL</a> database for archiving
<a href="accounting.html">accounting</a> records, managing
<a href="resource_limits.html">resource limits</a> by user or bank account,
or supporting sophisticated
<a href="priority_multifactor.html">job prioritization</a> algorithms.</p>
<p>While other resource managers do exist, SLURM is unique in several
respects:
<ul>
<li>It is designed to operate in a heterogeneous cluster with up to 65,536 nodes
and hundreds of thousands of processors.</li>
<li>It can sustain a throughput rate of over 120,000 jobs per hour with
bursts of job submissions at several times that rate.</li>
<li>Its source code is freely available under the
<a href="http://www.gnu.org/licenses/gpl.html">GNU General Public License</a>.</li>
<li>It is portable; written in C with a GNU autoconf configuration engine.
While initially written for Linux, other UNIX-like operating systems should
be easy porting targets.</li>
<li>It is highly tolerant of system failures, including failure of the node
executing its control functions.</li>
<li>A plugin mechanism exists to support various interconnects, authentication
mechanisms, schedulers, etc. These plugins are documented and simple enough
for the motivated end user to understand the source and add functionality.</li>
</ul></p>
<p>SLURM provides resource management on many of the most powerful computers in
the world including:
<ul>
<li><a href="http://www.nytimes.com/2010/10/28/technology/28compute.html?_r=1&partner=rss&emc=rss">
Tianhe-1A</a> designed by
<a href="http://english.nudt.edu.cn">The National University of Defence Technology (NUDT)</a>
in China with 14,336 Intel CPUs and 7,168 NVDIA Tesla M2050 GPUs, with a peak performance of 2.507 Petaflops.</li>
<li><a href="http://www.wcm.bull.com/internet/pr/rend.jsp?DocId=567851&lang=en">
Tera 100</a> at <a href="http://www.cea.fr">CEA</a>
with 140,000 Intel Xeon 7500 processing cores, 300TB of
central memory and a theoretical computing power of 1.25 Petaflops. Europe's
most powerful supercomputer.</li>
<li><a href="https://asc.llnl.gov/computing_resources/sequoia/">Dawn</a>,
a BlueGene/P system at <a href=https://www.llnl.gov">LLNL</a>
with 147,456 PowerPC 450 cores with a peak
performance of 0.5 Petaflops.</li>
<li><a href="http://www.cscs.ch/compute_resources">Rosa</a>,
a CRAY XT5 at the <a href="http://www.cscs.ch">Swiss National Supercomputer Centre</a>
named after Monte Rosa in the Swiss-Italian Alps, elevation 4,634m.
3,688 AMD hexa-core Opteron @ 2.4 GHz, 28.8 TB DDR2 RAM, 290 TB Disk,
9.6 GB/s interconnect bandwidth (Seastar).</li>
<li><a href="http://c-r-labs.com/">EKA</a> at Computational Research Laboratories,
India with 14,240 Xeon processors and Infiniband interconnect</li>
<li><a href="http://www.bsc.es/plantillaA.php?cat_id=5">MareNostrum</a>
a Linux cluster at the <a href="http://www.bsc.es">Barcelona Supercomputer Center</a>
with 10,240 PowerPC processors and a Myrinet switch</li>
<li><a href="http://en.wikipedia.org/wiki/Anton_(computer)">Anton</a>
a massively parallel supercomputer designed and built by
<a href="http://www.deshawresearch.com/">D. E. Shaw Research</a>
for molecular dynamics simulation using 512 custom-designed ASICs
and a three-dimensional torus interconnect.</li>
</ul>
<p style="text-align:center;">Last modified 5 May 2011</p>
<!--#include virtual="footer.txt"-->