blob: 8546a0f292455c198d2d75a3a7b8361f67087c82 [file] [edit]
<!--#include virtual="header.txt"-->
<a href="http:///www.bull.com" target="_blank"><img src="bull.jpg" style="float: right;" border="0"></a></p>
<h1>Slurm User Group Meeting 2011</h1>
<p>Hosted by <a href="http:///www.bull.com">Bull</a>
<h1>Agenda</h1>
<p>
The 2011 SLURM User Group Meeting will be held on September 22 and 23
in Phoenix, Arizona and will be hosted by Bull.
On September 22 there will be two parallel tracks of tutorials meeting in separate rooms.
One set of tutorials will be for users and the other will be for system adminitrators.
There will be a series of technical presentations on September 23.
The <a href="#schedule">Schedule</a> amd <a href="#abstracts">Abstracts</a>
are shown below.
</p>
<h2>Hotel Information</h2>
<p>The meeting will be held at
<a href="http://embassysuites1.hilton.com/en_US/es/hotel/PHXNOES-Embassy-Suites-Phoenix-North-Arizona/index.do">Embassy Suites Phoenix - North</a>
2577 West Greenway Road, Phoenix, Arizona, USA (Phone: 1-602-375-1777 Fax: 1-602-375-4012).
You may book your reservations on line at
<a href="http://embassysuites1.hilton.com/en_US/es/hotel/PHXNOES-Embassy-Suites-Phoenix-North-Arizona/index.do">Embassy Suites Phoenix - North</a><p></p>
<p>Please reference Bull when making your reservations to recieve a $79/room rate.</p>
<h2>Directions and Transportation</h2>
<p>From Phoenix Sky Harbor Airport, take I-10 west to I-17 North.
Follow I-17 to the Greenway Road, exit 211 approximately 15 miles.
Exit and turn right, 1/8th of a mile on the right is the hotel entrance.</p>
<p><a href="http://embassysuites1.hilton.com/en_US/es/hotel/PHXNOES-Embassy-Suites-Phoenix-North-Arizona/directions.do;jsessionid=DDD31DD6EFFAF2D32299955C321976F3.etc83">
View all directions, map, and airport information</a></p>
<h2>Contact</h2>
<p>If you need further informations about the event, or the
registration protocols, contact the
<a href="mailto:Nancy.Kritkausky@bull.com?subject=Informations">
<b>Slurm User Group 2011</b></a> organizers.<br>
<h2>Registration</h2>
<p>Please <a href="slurm_ug_registration.html">register</a> online no later
than August 22.</p>
<a name="schedule"><h1>Schedule</h1></a>
<h2>September 22: User Tutorials.</h2>
<table width="100%" border=1 cellspacing=0 cellpadding=0>
<tr>
<th width="15%">Time</th>
<th width="15%">Theme</th>
<th width="25%">Speaker</th>
<th width="45%">Title</th>
</tr>
<tr>
<td width="15%" bgcolor="#F0F1C9">08:30 - 09:00</td>
<td width="85%" colspan="3" bgcolor="#F0F1C9">&nbsp;Registration</td>
</tr>
<tr>
<td width="15%">09:00 - 10:30</td>
<td width="15%">&nbsp;User Tutorial #1</td>
<td width="25%">&nbsp;Don Albert and Rod Schultz (Bull)</td>
<td width="45%">&nbsp;SLURM: Beginners Usage</td>
</tr>
<tr>
<td width="15%" bgcolor="#F0F1C9">10:30 - 11:00</td>
<td width="85%" colspan="3" bgcolor="#F0F1C9">&nbsp;Coffee break</td>
</tr>
<tr>
<td width="15%">11:00 - 12:30</td>
<td width="15%">&nbsp;User Tutorial #2</td>
<td width="25%">&nbsp;Bill Brophy, Rod Schultz, Yiannis Georgiou (Bull)</td>
<td width="45%">&nbsp;SLURM: Advanced Usage Usage</td>
</tr>
<tr>
<td width="15%" bgcolor="#F0F1C9">12:30 - 14:00</td>
<td width="85%" colspan="3" bgcolor="#F0F1C9">&nbsp;Lunch at conference center</td>
</tr>
<tr>
<td width="15%">14:00 - 15:30</td>
<td width="15%">&nbsp;User Tutorial #3</td>
<td width="25%">&nbsp;Martin Perry and Yiannis Georgiou (Bull)</td>
<td width="45%">&nbsp;Resource Management for multicore/multi-threaded usage</td>
</tr>
<tr>
<td width="15%" bgcolor="#F0F1C9">15:30 - 16:00</td>
<td width="85%" colspan="3" bgcolor="#F0F1C9">&nbsp;Coffee break</td>
</tr>
<tr>
<td width="15%">16:00 - 17:00</td>
<td width="15%">&nbsp;Question and Answer</td>
<td width="25%">&nbsp;Danny Auble and Morris Jette (SchedMD)</td>
<td width="45%">&nbsp;Get your questions answered by the developers</td>
</tr>
</table>
<h2>September 22: System Adminitrator Tutorials.</h2>
<table width="100%" border=1 cellspacing=0 cellpadding=0>
<tr>
<th width="15%">Time</th>
<th width="15%">Theme</th>
<th width="25%">Speaker</th>
<th width="45%">Title</th>
</tr>
<tr>
<td width="15%" bgcolor="#F0F1C9">08:30 - 09:00</td>
<td width="85%" colspan="3" bgcolor="#F0F1C9">&nbsp;Registration</td>
</tr>
<tr>
<td width="15%">09:00 - 10:30</td>
<td width="15%">&nbsp;Admin Tutorial #1</td>
<td width="25%">&nbsp;David Egolf and Bill Brophy (Bull)</td>
<td width="45%">&nbsp;SLURM High Availability</td>
</tr>
<tr>
<td width="15%" bgcolor="#F0F1C9">10:30 - 11:00</td>
<td width="85%" colspan="3" bgcolor="#F0F1C9">&nbsp;Coffee break</td>
</tr>
<tr>
<td width="15%">11:00 - 12:30</td>
<td width="15%">&nbsp;Admin Tutorial #2</td>
<td width="25%">&nbsp;Dan Rusak (Bull)</td>
<td width="45%">&nbsp;Power Management / sview</td>
</tr>
<tr>
<td width="15%" bgcolor="#F0F1C9">12:30 - 14:00</td>
<td width="85%" colspan="3" bgcolor="#F0F1C9">&nbsp;Lunch at conference center</td>
</tr>
<tr>
<td width="15%">14:00 - 15:30</td>
<td width="15%">&nbsp;Admin Tutorial #3</td>
<td width="25%">&nbsp;Don Albert and Rod Schultz (Bull)</td>
<td width="45%">&nbsp;Accounting, limits and Priorities configurations</td>
</tr>
<tr>
<td width="15%" bgcolor="#F0F1C9">15:30 - 16:00</td>
<td width="85%" colspan="3" bgcolor="#F0F1C9">&nbsp;Coffee break</td>
</tr>
<tr>
<td width="15%">16:00 - 17:30</td>
<td width="15%">&nbsp;Admin Tutorial #4</td>
<td width="25%">&nbsp;Matthieu Hautreux (CEA), Yiannis Georgiou and Martin Perry (Bull)</td>
<td width="45%">&nbsp;Scalability, Scheduling and Task placement</td>
</tr>
</table>
<h2>September 23: Technical Session</h2>
<table width="100%" border=1 cellspacing=0 cellpadding=0>
<tr>
<th width="15%">Time</th>
<th width="15%">Theme</th>
<th width="25%">Speaker</th>
<th width="45%">Title</th>
</tr>
<tr>
<td width="15%" bgcolor="#F0F1C9">08:30 - 09:00</td>
<td width="85%" colspan="3" bgcolor="#F0F1C9">&nbsp;Registration</td>
</tr>
<tr>
<td width="15%" rowspan="4">09:00 - 10:40</td>
<td width="85%" colspan="3">&nbsp;Welcome</td>
</tr>
<tr>
<td width="15%">&nbsp;Keynote</td>
<td width="25%">&nbsp;William Kramer (NCSA)</td>
<td width="45%">&nbsp;Challenges and Opportunities for Exscale Resource Management and how Today's Petascale Systems are Guiding the Way</td>
</tr>
<tr>
<td width="15%">&nbsp;Session #1</td>
<td width="25%">&nbsp;Matthieu Hautreux (CEA)</td>
<td width="45%">&nbsp;SLURM at CEA</td>
</tr>
<tr>
<td width="15%">&nbsp;Session #2</td>
<td width="25%">&nbsp;Don Lipari (LLNL)</td>
<td width="45%">&nbsp;LLNL site report</td>
</tr>
<tr>
<td width="15%" bgcolor="#F0F1C9">10:40 - 11:00</td>
<td width="85%" colspan="3" bgcolor="#F0F1C9">&nbsp;Coffee break</td>
</tr>
<tr>
<td width="15%" rowspan="3">11:00 - 12:30</td>
<td width="15%">&nbsp;Session #3</td>
<td width="25%">&nbsp;Alejandro Lucero Palau (BSC)</td>
<td width="45%">&nbsp;SLURM Simulator</td>
</tr>
<tr>
<td width="15%">&nbsp;Session #4</td>
<td width="25%">&nbsp;Danny Auble (SchedMD)</td>
<td width="45%">&nbsp;SLURM operation on IBM BlueGene/Q</td>
</tr>
<tr>
<td width="15%">&nbsp;Session #5</td>
<td width="25%">&nbsp;Morris Jette (SchedMD)</td>
<td width="45%">&nbsp;SLURM operation on Cray XT and XE</td>
</tr>
<tr>
<td width="15%" bgcolor="#F0F1C9">12:30 - 14:00</td>
<td width="85%" colspan="3" bgcolor="#F0F1C9">&nbsp;Lunch at conference center</td>
</tr>
<tr>
<td width="15%" rowspan="3">14:00 - 15:30</td>
<td width="15%">&nbsp;Session #6</td>
<td width="25%">&nbsp;Mariusz Mamo&#324;ski (Pozna&#324; University)</td>
<td width="45%">&nbsp;Introduction to SLURM DRMAA</td>
</tr>
<tr>
<td width="15%">&nbsp;Session #7</td>
<td width="25%">&nbsp;Robert Stober, Sr. (Bright Computing)</td>
<td width="45%">&nbsp;Bright Cluster Manager & SLURM: Benefits of Seamless Integration</td>
</tr>
<tr>
<td width="15%">&nbsp;Session #8</td>
<td width="25%">&nbsp;Morris Jette (SchedMD)</td>
<td width="45%">&nbsp;Proposed Design for Job Step Management in User Space</td>
</tr>
<tr>
<td width="15%" bgcolor="#F0F1C9">15:30 - 16:00</td>
<td width="85%" colspan="3" bgcolor="#F0F1C9">&nbsp;Coffee break</td>
</tr>
<tr>
<td width="15%" rowspan="3">16:00 - 17:30</td>
<td width="15%">&nbsp;Session #9</td>
<td width="25%">&nbsp;Don Lipari (LLNL)</td>
<td width="45%">&nbsp;Proposed Design for Enhanced Enterprise-wide Scheduling</td>
</tr>
<tr>
<td width="15%">&nbsp;Session #10</td>
<td width="25%">&nbsp;Danny Auble and Morris Jette (SchedMD)</td>
<td width="45%">&nbsp;SLURM Version 2.3 and plans for future releases</td>
</tr>
<tr>
<td width="85%" colspan="3">&nbsp;Open discussion, feature requests, etc.</td>
</tr>
</table>
<br><br>
<a name="abstracts"><h1>Abstracts</h1></a>
<h2>User Tutorial #1</h2>
SLURM Beginners Usage<br>
Don Albert and Rod Schultz (Bull)
<ul>
<li>Simple use of commands (submission/monitoring/result collection)</li>
<li>Reservations</li>
<li>Use of accounting and reporting</li>
<li>Scheduling techniques for smaller response time (setting of walltime for backfill , etc)</li>
</ul>
<h2>User Tutorial #2</h2>
SLURM Advanced Usage<br>
Bill Brophy, Rod Schultz, Yiannis Georgiou (Bull)
<ul>
<li>MPI jobs</li>
<li>Checkpoint/Restart (BLCR or application level)</li>
<li>Preemption / Gang Scheduling Usage</li>
<li>Dynamic allocations (growing/shrinking)</li>
<li>Grace Time Delay with Preemption</li>
</ul>
<h2>User Tutorial #3</h2>
Resource Management for multicore/multi-threaded usage<br>
Martin Perry and Yiannis Georgiou (Bull)
<ul>
<li>CPU allocation</li>
<li>CPU/tasks distribution</li>
<li>Task binding</li>
<li>Internals of the allocation procedures</li>
</ul>
<h2>Administrator Tutorial #1</h2>
SLURM High Availability<br>
David Egolf and Bill Brophy (Bull)
<ul>
<li>How to set up the High Availability SLURM</li>
<li>Event logging with striggers</li>
</ul>
<h2>Administrator Tutorial #2</h2>
Power Management / Sview<br>
Dan Rusak (Bull)
<ul>
<li>Power Management configuration</li>
<li>sview presentation</li>
</ul>
<h2>Administrator Tutorial #3</h2>
Accounting, limits and Priorities configurations<br>
Don Albert and Rod Schultz (Bull)
<ul>
<li>Accounting with slurmdbd configuration</li>
<li>Multifactor job priorities with examples considering all different factors</li>
<li>QOS configuration</li>
<li>Fairsharing setting</li>
</ul>
<h2>Administrator Tutorial #4</h2>
Scalability, Scheduling and Task placement<br>
Matthieu Hautreux (CEA), Yiannis Georgiou and Martin Perry (Bull)
<ul>
<li>High Throughput Computing</li>
<li>Topology constraints config</li>
<li>Generic Resources and GPUs config</li>
<li>Task Placement with Cgroups</li>
</ul>
<h2> Keynote Speaker</h2>
Challenges and Opportunities for Exscale Resource Management and how
Today's Petascale Systems are Guiding the Way<br>
William Kramer (NCSA)<br><br>
Resource management challenges currently experienced on the Blue Waters
computer will be described. These experiences will be extended to describe
the additional challenges faced in exascale and trans-petascale systems.
<h2>Session #1</h2>
CEA Site report<br>
Matthieu Hautreux (CEA)<br><br>
Evolutions and feedback from Tera100. SLURM on Curie, the PRACE second Tier-0
system that is planned to be installed by the end of the year in a new facility
hosted at CEA. Curie will be a 1.6 Petaflop system from Bull.
<h2>Session #2</h2>
LLNL site report<br>
Don Lipari (LLNL)<br><br>
Don Lipari will provide an overview of the batch scheduling systems in use
at LLNL and an overview on how they are managed.
<h2>Session #3</h2>
SLURM Simulator<br>
Alejandro Lucero Palau (BSC)<br><br>
Batch scheduling for high performance cluster installations has two main goals:
1) to keep the whole machine working at full capacity at all times, and
2) to respect priorities avoiding lower priority jobs jeopardizing higher
priority ones. Usually, batch schedulers allow different policies with
several variables to be tuned by policy. Other features like special job
requests, reservations or job preemption increase the complexity for achiev-
ing a fine-tuned algorithm. A local decision for a specific job can change
the full scheduling for a high number of jobs and what can be thought
as logical within a short term could make no sense for a long trace mea-
sured in weeks or months. Although it is possible to extract algorithms
from batch scheduling software to make simulations of large job traces,
this is not the ideal approach since scheduling is not an isolated part of
this type of tools and replicating same environment requires an important
effort plus a high maintenance cost. We present a method for obtaining a
special mode of operation for a real production-ready scheduling software,
SLURM, where we can simulate execution of real job traces to evaluate
impact of scheduling policies and policy tuning.
<h2>Session #4</h2>
SLURM Operation on IBM BlueGene/Q<br>
Danny Auble (SchedMD)<br><br>
SLURM version 2.3 supports IBM BlueGene/Q. This presentation will report the
design and operation of SLURM with respect to BlueGene/Q systems.
<h2>Session #5</h2>
SLURM Operation on Cray XT and XE systems<br>
Morris Jette (SchedMD)<br><br>
SLURM version 2.3 supports Cray XT and XE systems running over Cray's ALPS
(Application Level Placement Scheduler) resource manager. This presentation
will discuss the design and operation of SLURM with respect to Cray systems.
<h2>Session #6</h2>
Introduction to SLURM DRMAA<br>
Mariusz Mamo&#324;ski (Pozna&#324; University)<br><br>
DRMAA or Distributed Resource Management Application API is a high-level
Open Grid Forum API specification for the submission and control of jobs
in a Grid architecture.
<h2>Session #7</h2>
Bright Cluster Manager & SLURM: Benefits of Seamless Integration<br>
Robert Stober, Sr. (Bright Computing)<br><br>
Bright Cluster Manager, tightly integrated with SLURM, simplifies HPC
cluster installation and management while boosting system throughput. Bright
automatically installs, configures and deploys SLURM so that clusters are
ready to use in minutes rather than days. Bright provides extensive and
extensible monitoring and management through its intuitive Bright Cluster
Manager GUI, powerful cluster management shell, and customizable web-based
user portal.
Additional integration benefits include sampling, analysis and visualization
of all key SLURM metrics from within the Bright GUI, automatic head node
failover, and extensive pre-job health checking capability. Regarding the
latter, say good-bye to the black hole node syndrome: Bright plus SLURM
effectively prevent this productivity-killing problem by identifying and
sidelining problematic nodes before the job is run.
<h2>Session #8</h2>
Proposed Design for Job Step Management in User Space<br>
Morris Jette (SchedMD)<br><br>
SLURM currently creates and manages job steps using SLURM's control daemon,
slurmctld. Since some user jobs create thousands of job steps, the management
of those job steps accounts for most of slurmctld's work. It is possible to
move job step management from slurmctld into user space to improve SLURM
scalability and performance. A possible implementation of this will be
presented.
<h2>Session #9</h2>
Proposed Design for Enhanced Enterprise-wide Scheduling<br>
Don Lipari (LLNL)<br><br>
SLURM currently supports the ability to submit and status jobs between
computers at site, however the current design has some limitations. When a job
is submitted with several possible computers usable for its execution, the
job is routed to the computer on which it is expected to start earliest.
Changes in the workload or system failures could make moving the job to another
computer result in faster initiation, but that is currently impossible. SLURM
is also unable to support dependencies between jobs executing on different
computers. The design of a SLURM meta-scheduler with enhanced enterprise-wide
scheduling capabilities will be presented.
<h2>Session #10</h2>
Contents of SLURM Version 2.3 and plans for future releases<br>
Danny Auble and Morris Jette (SchedMD)<br><br>
An overview of the changes SLURM Version 2.3 will be presented along with
current plans for future releases.
<h2>Open Discussion</h2>
All meeting attendees will be invited to provide input with respect to
SLURM's design and development work.
We also invite proposals for hosting the SLURM User Group Meeting in 2012.
<!--#include virtual="footer.txt"-->