| \section{Introduction} |
| Linux clusters, often constructed by using commodity off-the-shelf (COTS) componnets, |
| have become increasingly populuar as a computing platform for parallel computation |
| in recent years, mainly due to their ability to deliver a high perfomance-cost ratio. |
| Researchers have built and used small to medium size clusters for various |
| applications~\cite{BeowulfWeb,LokiWeb}. |
| The continuous decrease in the price of the COTS parts in conjunction with |
| the good scalability of the cluster architecture has now made it feasible to economically |
| build large-scale clusters with thousands of processors~\cite{MCRWeb,PCRWeb}. |
| |
| An essential component that is needed to harness such a computer is a |
| resource management system. |
| A resource management system (or resource manager) performs such crucial tasks as |
| scheduling user jobs, monitoring machine and job status, launching user applications, and |
| managing machine configuration, |
| An ideal resource manager should be simple, efficient, scalable, fault-tolerant, |
| and portable. |
| |
| Unfortunately there are no open-source resource management systems currently available |
| which satisfy these requirements. |
| A survey~\cite{Jette02} has revealed that many existing resource managers have poor scalability and fault-tolerance rendering them unsuitable for large clusters having |
| thousands of processors~\cite{LoadLevelerWeb,LoadLevelerManual}. |
| While some proprietary cluster managers are suitable for large clusters, |
| they are typically designed for particular computer systems and/or |
| interconnects~\cite{RMS,LoadLevelerWeb,LoadLevelerManual}. |
| Proprietary systems can also be expensive and unavailable in source-code form. |
| Furthermore, proprietary cluster management functionality is usually provided as a |
| part of a specific job scheduling system package. |
| This mandates the use of the given scheduler just to manage a cluster, |
| even though the scheduler does not necessarily meet the need of organization that hosts the cluster. |
| Clear separation of the cluster management functionality from scheduling policy is desired. |
| |
| This observation led us to set out to design a simple, highly scalable, and |
| portable resource management system. |
| The result of this effort is Simple Linux Utility Resource Management |
| (SLURM\footnote{A tip of the hat to Matt Groening and creators of {\em Futurama}, |
| where Slurm is the most popular carbonated beverage in the universe.}). |
| SLURM was developed with the following design goals: |
| |
| \begin{itemize} |
| \item {\em Simplicity}: SLURM is simple enough to allow motivated end-users |
| to understand its source code and add functionality. The authors will |
| avoid the temptation to add features unless they are of general appeal. |
| |
| \item {\em Open Source}: SLURM is available to everyone and will remain free. |
| Its source code is distributed under the GNU General Public |
| License~\cite{GPLWeb}. |
| |
| \item {\em Portability}: SLURM is written in the C language, with a GNU |
| {\em autoconf} configuration engine. |
| While initially written for Linux, other UNIX-like operating systems |
| should be easy porting targets. |
| SLURM also supports a general purpose {\em plugin} mechanism, which |
| permits a variety of different infrastructures to be easily supported. |
| The SLURM configuration file specifies which set of plugin modules |
| should be used. |
| |
| \item {\em Interconnect independence}: SLURM supports UDP/IP based |
| communication as well as the Quadrics Elan3 and Myrinet interconnects. |
| Adding support for other interconnects is straightforward and utilizes |
| the plugin mechanism described above. |
| |
| \item {\em Scalability}: SLURM is designed for scalability to clusters of |
| thousands of nodes. |
| Jobs may specify their resource requirements in a variety of ways |
| including requirements options and ranges, potentially permitting |
| faster initiation than otherwise possible. |
| |
| \item {\em Robustness}: SLURM can handle a variety of failure modes |
| without terminating workloads, including crashes of the node running |
| the SLURM controller. |
| User jobs may be configured to continue execution despite the failure |
| of one or more nodes on which they are executing. |
| Nodes allocated to a job are available for reuse as soon as the job(s) |
| allocated to that node terminate. |
| If some nodes fail to complete job termination |
| in a timely fashion due to hardware of software problems, only the |
| scheduling of those tardy nodes will be effected. |
| |
| \item {\em Secure}: SLURM employs crypto technology to authenticate |
| users to services and services to each other with a variety of options |
| available through the plugin mechanism. |
| SLURM does not assume that its networks are physically secure, |
| but does assume that the entire cluster is within a single |
| administrative domain with a common user base across the |
| entire cluster. |
| |
| \item {\em System administrator friendly}: SLURM is configured a |
| simple configuration file and minimizes distributed state. |
| Its configuration may be changed at any time without impacting running jobs. |
| Heterogeneous nodes within a cluster may be easily managed. |
| SLURM interfaces are usable by scripts and its behavior is highly |
| deterministic. |
| |
| \end{itemize} |
| |
| The main contribution of our work is that we have provided a readily available |
| tool that anybody can use to efficiently manage clusters of different size and architecture. |
| SLURM is highly scalable\footnote{It was observed that it took less than five seconds for SLURM to launch a 1900-task job over 950 nodes on recently installed cluster at Lawrence Livermore National Laboratory.}. |
| The SLURM can be easily ported to any cluster system with minimal effort with its plugin |
| capability and can be used with any meta-batch scheduler or a Grid resource broker~\cite{Gridbook} |
| with its well-defined interfaces. |
| |
| The rest of the paper is organized as follows. |
| Section 2 describes the architecture of SLURM in detail. Section 3 discusses the services provided by SLURM followed by performance study of |
| SLURM in Section 4. Brief survey of existing cluster management systems is presented in Section 5. |
| %Section 6 describes how the SLURM can be used with more sphisticated external schedulers. |
| Concluding remarks and future development plan of SLURM is given in Section 6. |