| <!--#include virtual="header.txt"--> |
| |
| <h1>SLURM: A Highly Scalable Resource Manager</h1> |
| |
| <p>SLURM is an open-source resource manager designed for Linux clusters of |
| all sizes. |
| It provides three key functions. |
| First it allocates exclusive and/or non-exclusive access to resources |
| (computer nodes) to users for some duration of time so they can perform work. |
| Second, it provides a framework for starting, executing, and monitoring work |
| (typically a parallel job) on a set of allocated nodes. |
| Finally, it arbitrates contention for resources by managing a queue of |
| pending work. </p> |
| |
| <p>SLURM's design is very modular with dozens of optional plugins. |
| In its simplest configuration, it can be installed and configured in a |
| couple of minutes (see <a href="http://www.linux-mag.com/id/7239/1/"> |
| Caos NSA and Perceus: All-in-one Cluster Software Stack</a> |
| by Jeffrey B. Layton) and is used by |
| <a href="http://www.intel.com/">Intel</a> on their 48-core |
| <a href="http://www.hpcwire.com/features/Intel-Unveils-48-Core-Research-Chip-78378487.html"> |
| "cluster on a chip"</a>. |
| More complex configurations can satisfy the job scheduling needs of |
| world-class computer centers and rely upon a |
| <a href="http://www.mysql.com/">MySQL</a> database for archiving |
| <a href="accounting.html">accounting</a> records, managing |
| <a href="resource_limits.html">resource limits</a> by user or bank account, |
| or supporting sophisticated |
| <a href="priority_multifactor.html">job prioritization</a> algorithms.</p> |
| |
| <p>While other resource managers do exist, SLURM is unique in several |
| respects: |
| <ul> |
| <li>It is designed to operate in a heterogeneous cluster with up to 65,536 nodes |
| and hundreds of thousands of processors.</li> |
| <li>It can sustain a throughput rate of over 120,000 jobs per hour with |
| bursts of job submissions at several times that rate.</li> |
| <li>Its source code is freely available under the |
| <a href="http://www.gnu.org/licenses/gpl.html">GNU General Public License</a>.</li> |
| <li>It is portable; written in C with a GNU autoconf configuration engine. |
| While initially written for Linux, other UNIX-like operating systems should |
| be easy porting targets.</li> |
| <li>It is highly tolerant of system failures, including failure of the node |
| executing its control functions.</li> |
| <li>A plugin mechanism exists to support various interconnects, authentication |
| mechanisms, schedulers, etc. These plugins are documented and simple enough |
| for the motivated end user to understand the source and add functionality.</li> |
| </ul></p> |
| |
| <p>SLURM provides resource management on many of the most powerful computers in |
| the world including: |
| <ul> |
| <li><a href="http://www.nytimes.com/2010/10/28/technology/28compute.html?_r=1&partner=rss&emc=rss"> |
| Tianhe-1A</a> designed by |
| <a href="http://english.nudt.edu.cn">The National University of Defence Technology (NUDT)</a> |
| in China with 14,336 Intel CPUs and 7,168 NVDIA Tesla M2050 GPUs, with a peak performance of 2.507 Petaflops.</li> |
| |
| <li><a href="http://www.wcm.bull.com/internet/pr/rend.jsp?DocId=567851&lang=en"> |
| Tera 100</a> at <a href="http://www.cea.fr">CEA</a> |
| with 140,000 Intel Xeon 7500 processing cores, 300TB of |
| central memory and a theoretical computing power of 1.25 Petaflops. Europe's |
| most powerful supercomputer.</li> |
| |
| <li><a href="https://asc.llnl.gov/computing_resources/sequoia/">Dawn</a>, |
| a BlueGene/P system at <a href=https://www.llnl.gov">LLNL</a> |
| with 147,456 PowerPC 450 cores with a peak |
| performance of 0.5 Petaflops.</li> |
| |
| <li><a href="http://www.cscs.ch/compute_resources">Rosa</a>, |
| a CRAY XT5 at the <a href="http://www.cscs.ch">Swiss National Supercomputer Centre</a> |
| named after Monte Rosa in the Swiss-Italian Alps, elevation 4,634m. |
| 3,688 AMD hexa-core Opteron @ 2.4 GHz, 28.8 TB DDR2 RAM, 290 TB Disk, |
| 9.6 GB/s interconnect bandwidth (Seastar).</li> |
| |
| <li><a href="http://c-r-labs.com/">EKA</a> at Computational Research Laboratories, |
| India with 14,240 Xeon processors and Infiniband interconnect</li> |
| |
| <li><a href="http://www.bsc.es/plantillaA.php?cat_id=5">MareNostrum</a> |
| a Linux cluster at the <a href="http://www.bsc.es">Barcelona Supercomputer Center</a> |
| with 10,240 PowerPC processors and a Myrinet switch</li> |
| |
| <li><a href="http://en.wikipedia.org/wiki/Anton_(computer)">Anton</a> |
| a massively parallel supercomputer designed and built by |
| <a href="http://www.deshawresearch.com/">D. E. Shaw Research</a> |
| for molecular dynamics simulation using 512 custom-designed ASICs |
| and a three-dimensional torus interconnect.</li> |
| </ul> |
| |
| <p style="text-align:center;">Last modified 5 May 2011</p> |
| |
| <!--#include virtual="footer.txt"--> |