blob: 113f481b8e97c9e2e9132db67eda7e59c4ee46fd [file] [log] [blame] [edit]
\section{Introduction}
Linux clusters, often constructed by using commodity off-the-shelf (COTS) componnets,
have become increasingly populuar as a computing platform for parallel computation
in recent years, mainly due to their ability to deliver a high perfomance-cost ratio.
Researchers have built and used small to medium size clusters for various
applications~\cite{BeowulfWeb,LokiWeb}.
The continuous decrease in the price of the COTS parts in conjunction with
the good scalability of the cluster architecture has now made it feasible to economically
build large-scale clusters with thousands of processors~\cite{MCRWeb,PCRWeb}.
An essential component that is needed to harness such a computer is a
resource management system.
A resource management system (or resource manager) performs such crucial tasks as
scheduling user jobs, monitoring machine and job status, launching user applications, and
managing machine configuration,
An ideal resource manager should be simple, efficient, scalable, fault-tolerant,
and portable.
Unfortunately there are no open-source resource management systems currently available
which satisfy these requirements.
A survey~\cite{Jette02} has revealed that many existing resource managers have poor scalability and fault-tolerance rendering them unsuitable for large clusters having
thousands of processors~\cite{LoadLevelerWeb,LoadLevelerManual}.
While some proprietary cluster managers are suitable for large clusters,
they are typically designed for particular computer systems and/or
interconnects~\cite{RMS,LoadLevelerWeb,LoadLevelerManual}.
Proprietary systems can also be expensive and unavailable in source-code form.
Furthermore, proprietary cluster management functionality is usually provided as a
part of a specific job scheduling system package.
This mandates the use of the given scheduler just to manage a cluster,
even though the scheduler does not necessarily meet the need of organization that hosts the cluster.
Clear separation of the cluster management functionality from scheduling policy is desired.
This observation led us to set out to design a simple, highly scalable, and
portable resource management system.
The result of this effort is Simple Linux Utility Resource Management
(SLURM\footnote{A tip of the hat to Matt Groening and creators of {\em Futurama},
where Slurm is the most popular carbonated beverage in the universe.}).
SLURM was developed with the following design goals:
\begin{itemize}
\item {\em Simplicity}: SLURM is simple enough to allow motivated end-users
to understand its source code and add functionality. The authors will
avoid the temptation to add features unless they are of general appeal.
\item {\em Open Source}: SLURM is available to everyone and will remain free.
Its source code is distributed under the GNU General Public
License~\cite{GPLWeb}.
\item {\em Portability}: SLURM is written in the C language, with a GNU
{\em autoconf} configuration engine.
While initially written for Linux, other UNIX-like operating systems
should be easy porting targets.
SLURM also supports a general purpose {\em plugin} mechanism, which
permits a variety of different infrastructures to be easily supported.
The SLURM configuration file specifies which set of plugin modules
should be used.
\item {\em Interconnect independence}: SLURM supports UDP/IP based
communication as well as the Quadrics Elan3 and Myrinet interconnects.
Adding support for other interconnects is straightforward and utilizes
the plugin mechanism described above.
\item {\em Scalability}: SLURM is designed for scalability to clusters of
thousands of nodes.
Jobs may specify their resource requirements in a variety of ways
including requirements options and ranges, potentially permitting
faster initiation than otherwise possible.
\item {\em Robustness}: SLURM can handle a variety of failure modes
without terminating workloads, including crashes of the node running
the SLURM controller.
User jobs may be configured to continue execution despite the failure
of one or more nodes on which they are executing.
Nodes allocated to a job are available for reuse as soon as the job(s)
allocated to that node terminate.
If some nodes fail to complete job termination
in a timely fashion due to hardware of software problems, only the
scheduling of those tardy nodes will be effected.
\item {\em Secure}: SLURM employs crypto technology to authenticate
users to services and services to each other with a variety of options
available through the plugin mechanism.
SLURM does not assume that its networks are physically secure,
but does assume that the entire cluster is within a single
administrative domain with a common user base across the
entire cluster.
\item {\em System administrator friendly}: SLURM is configured a
simple configuration file and minimizes distributed state.
Its configuration may be changed at any time without impacting running jobs.
Heterogeneous nodes within a cluster may be easily managed.
SLURM interfaces are usable by scripts and its behavior is highly
deterministic.
\end{itemize}
The main contribution of our work is that we have provided a readily available
tool that anybody can use to efficiently manage clusters of different size and architecture.
SLURM is highly scalable\footnote{It was observed that it took less than five seconds for SLURM to launch a 1900-task job over 950 nodes on recently installed cluster at Lawrence Livermore National Laboratory.}.
The SLURM can be easily ported to any cluster system with minimal effort with its plugin
capability and can be used with any meta-batch scheduler or a Grid resource broker~\cite{Gridbook}
with its well-defined interfaces.
The rest of the paper is organized as follows.
Section 2 describes the architecture of SLURM in detail. Section 3 discusses the services provided by SLURM followed by performance study of
SLURM in Section 4. Brief survey of existing cluster management systems is presented in Section 5.
%Section 6 describes how the SLURM can be used with more sphisticated external schedulers.
Concluding remarks and future development plan of SLURM is given in Section 6.