| <!--#include virtual="header.txt"--> |
| |
| <a href="http:///www.bull.com" target="_blank"><img src="bull.jpg" style="float: right;" border="0"></a></p> |
| |
| <h1>Slurm User Group Meeting 2011</h1> |
| |
| <p>Hosted by <a href="http:///www.bull.com">Bull</a> |
| |
| <h1>Agenda</h1> |
| |
| <p> |
| The 2011 SLURM User Group Meeting will be held on September 22 and 23 |
| in Phoenix, Arizona and will be hosted by Bull. |
| On September 22 there will be two parallel tracks of tutorials meeting in separate rooms. |
| One set of tutorials will be for users and the other will be for system adminitrators. |
| There will be a series of technical presentations on September 23. |
| The <a href="#schedule">Schedule</a> amd <a href="#abstracts">Abstracts</a> |
| are shown below. |
| </p> |
| |
| <h2>Hotel Information</h2> |
| <p>The meeting will be held at |
| <a href="http://embassysuites1.hilton.com/en_US/es/hotel/PHXNOES-Embassy-Suites-Phoenix-North-Arizona/index.do">Embassy Suites Phoenix - North</a> |
| 2577 West Greenway Road, Phoenix, Arizona, USA (Phone: 1-602-375-1777 Fax: 1-602-375-4012). |
| You may book your reservations on line at |
| <a href="http://embassysuites1.hilton.com/en_US/es/hotel/PHXNOES-Embassy-Suites-Phoenix-North-Arizona/index.do">Embassy Suites Phoenix - North</a><p></p> |
| |
| <p>Please reference Bull when making your reservations to recieve a $79/room rate.</p> |
| |
| <h2>Directions and Transportation</h2> |
| <p>From Phoenix Sky Harbor Airport, take I-10 west to I-17 North. |
| Follow I-17 to the Greenway Road, exit 211 approximately 15 miles. |
| Exit and turn right, 1/8th of a mile on the right is the hotel entrance.</p> |
| <p><a href="http://embassysuites1.hilton.com/en_US/es/hotel/PHXNOES-Embassy-Suites-Phoenix-North-Arizona/directions.do;jsessionid=DDD31DD6EFFAF2D32299955C321976F3.etc83"> |
| View all directions, map, and airport information</a></p> |
| |
| <h2>Contact</h2> |
| <p>If you need further informations about the event, or the |
| registration protocols, contact the |
| <a href="mailto:Nancy.Kritkausky@bull.com?subject=Informations"> |
| <b>Slurm User Group 2011</b></a> organizers.<br> |
| |
| |
| <h2>Registration</h2> |
| <p>Please <a href="slurm_ug_registration.html">register</a> online no later |
| than August 22.</p> |
| |
| <a name="schedule"><h1>Schedule</h1></a> |
| |
| <h2>September 22: User Tutorials.</h2> |
| |
| <table width="100%" border=1 cellspacing=0 cellpadding=0> |
| |
| <tr> |
| <th width="15%">Time</th> |
| <th width="15%">Theme</th> |
| <th width="25%">Speaker</th> |
| <th width="45%">Title</th> |
| </tr> |
| |
| <tr> |
| <td width="15%" bgcolor="#F0F1C9">08:30 - 09:00</td> |
| <td width="85%" colspan="3" bgcolor="#F0F1C9"> Registration</td> |
| </tr> |
| |
| <tr> |
| <td width="15%">09:00 - 10:30</td> |
| <td width="15%"> User Tutorial #1</td> |
| <td width="25%"> Don Albert and Rod Schultz (Bull)</td> |
| <td width="45%"> SLURM: Beginners Usage</td> |
| </tr> |
| |
| <tr> |
| <td width="15%" bgcolor="#F0F1C9">10:30 - 11:00</td> |
| <td width="85%" colspan="3" bgcolor="#F0F1C9"> Coffee break</td> |
| </tr> |
| |
| <tr> |
| <td width="15%">11:00 - 12:30</td> |
| <td width="15%"> User Tutorial #2</td> |
| <td width="25%"> Bill Brophy, Rod Schultz, Yiannis Georgiou (Bull)</td> |
| <td width="45%"> SLURM: Advanced Usage Usage</td> |
| </tr> |
| |
| <tr> |
| <td width="15%" bgcolor="#F0F1C9">12:30 - 14:00</td> |
| <td width="85%" colspan="3" bgcolor="#F0F1C9"> Lunch at conference center</td> |
| </tr> |
| |
| <tr> |
| <td width="15%">14:00 - 15:30</td> |
| <td width="15%"> User Tutorial #3</td> |
| <td width="25%"> Martin Perry and Yiannis Georgiou (Bull)</td> |
| <td width="45%"> Resource Management for multicore/multi-threaded usage</td> |
| </tr> |
| |
| <tr> |
| <td width="15%" bgcolor="#F0F1C9">15:30 - 16:00</td> |
| <td width="85%" colspan="3" bgcolor="#F0F1C9"> Coffee break</td> |
| </tr> |
| |
| <tr> |
| <td width="15%">16:00 - 17:00</td> |
| <td width="15%"> Question and Answer</td> |
| <td width="25%"> Danny Auble and Morris Jette (SchedMD)</td> |
| <td width="45%"> Get your questions answered by the developers</td> |
| </tr> |
| |
| </table> |
| |
| <h2>September 22: System Adminitrator Tutorials.</h2> |
| |
| <table width="100%" border=1 cellspacing=0 cellpadding=0> |
| |
| <tr> |
| <th width="15%">Time</th> |
| <th width="15%">Theme</th> |
| <th width="25%">Speaker</th> |
| <th width="45%">Title</th> |
| </tr> |
| |
| <tr> |
| <td width="15%" bgcolor="#F0F1C9">08:30 - 09:00</td> |
| <td width="85%" colspan="3" bgcolor="#F0F1C9"> Registration</td> |
| </tr> |
| |
| <tr> |
| <td width="15%">09:00 - 10:30</td> |
| <td width="15%"> Admin Tutorial #1</td> |
| <td width="25%"> David Egolf and Bill Brophy (Bull)</td> |
| <td width="45%"> SLURM High Availability</td> |
| </tr> |
| |
| <tr> |
| <td width="15%" bgcolor="#F0F1C9">10:30 - 11:00</td> |
| <td width="85%" colspan="3" bgcolor="#F0F1C9"> Coffee break</td> |
| </tr> |
| |
| <tr> |
| <td width="15%">11:00 - 12:30</td> |
| <td width="15%"> Admin Tutorial #2</td> |
| <td width="25%"> Dan Rusak (Bull)</td> |
| <td width="45%"> Power Management / sview</td> |
| </tr> |
| |
| <tr> |
| <td width="15%" bgcolor="#F0F1C9">12:30 - 14:00</td> |
| <td width="85%" colspan="3" bgcolor="#F0F1C9"> Lunch at conference center</td> |
| </tr> |
| |
| <tr> |
| <td width="15%">14:00 - 15:30</td> |
| <td width="15%"> Admin Tutorial #3</td> |
| <td width="25%"> Don Albert and Rod Schultz (Bull)</td> |
| <td width="45%"> Accounting, limits and Priorities configurations</td> |
| </tr> |
| |
| <tr> |
| <td width="15%" bgcolor="#F0F1C9">15:30 - 16:00</td> |
| <td width="85%" colspan="3" bgcolor="#F0F1C9"> Coffee break</td> |
| </tr> |
| |
| <tr> |
| <td width="15%">16:00 - 17:30</td> |
| <td width="15%"> Admin Tutorial #4</td> |
| <td width="25%"> Matthieu Hautreux (CEA), Yiannis Georgiou and Martin Perry (Bull)</td> |
| <td width="45%"> Scalability, Scheduling and Task placement</td> |
| </tr> |
| |
| </table> |
| |
| <h2>September 23: Technical Session</h2> |
| |
| <table width="100%" border=1 cellspacing=0 cellpadding=0> |
| |
| <tr> |
| <th width="15%">Time</th> |
| <th width="15%">Theme</th> |
| <th width="25%">Speaker</th> |
| <th width="45%">Title</th> |
| </tr> |
| |
| <tr> |
| <td width="15%" bgcolor="#F0F1C9">08:30 - 09:00</td> |
| <td width="85%" colspan="3" bgcolor="#F0F1C9"> Registration</td> |
| </tr> |
| |
| <tr> |
| <td width="15%" rowspan="4">09:00 - 10:40</td> |
| <td width="85%" colspan="3"> Welcome</td> |
| </tr> |
| |
| <tr> |
| <td width="15%"> Keynote</td> |
| <td width="25%"> William Kramer (NCSA)</td> |
| <td width="45%"> Challenges and Opportunities for Exscale Resource Management and how Today's Petascale Systems are Guiding the Way</td> |
| </tr> |
| <tr> |
| <td width="15%"> Session #1</td> |
| <td width="25%"> Matthieu Hautreux (CEA)</td> |
| <td width="45%"> SLURM at CEA</td> |
| </tr> |
| <tr> |
| <td width="15%"> Session #2</td> |
| <td width="25%"> Don Lipari (LLNL)</td> |
| <td width="45%"> LLNL site report</td> |
| </tr> |
| |
| <tr> |
| <td width="15%" bgcolor="#F0F1C9">10:40 - 11:00</td> |
| <td width="85%" colspan="3" bgcolor="#F0F1C9"> Coffee break</td> |
| </tr> |
| |
| <tr> |
| <td width="15%" rowspan="3">11:00 - 12:30</td> |
| <td width="15%"> Session #3</td> |
| <td width="25%"> Alejandro Lucero Palau (BSC)</td> |
| <td width="45%"> SLURM Simulator</td> |
| </tr> |
| <tr> |
| <td width="15%"> Session #4</td> |
| <td width="25%"> Danny Auble (SchedMD)</td> |
| <td width="45%"> SLURM operation on IBM BlueGene/Q</td> |
| </tr> |
| <tr> |
| <td width="15%"> Session #5</td> |
| <td width="25%"> Morris Jette (SchedMD)</td> |
| <td width="45%"> SLURM operation on Cray XT and XE</td> |
| </tr> |
| |
| <tr> |
| <td width="15%" bgcolor="#F0F1C9">12:30 - 14:00</td> |
| <td width="85%" colspan="3" bgcolor="#F0F1C9"> Lunch at conference center</td> |
| </tr> |
| |
| <tr> |
| <td width="15%" rowspan="3">14:00 - 15:30</td> |
| <td width="15%"> Session #6</td> |
| <td width="25%"> Mariusz Mamoński (Poznań University)</td> |
| <td width="45%"> Introduction to SLURM DRMAA</td> |
| </tr> |
| <tr> |
| <td width="15%"> Session #7</td> |
| <td width="25%"> Robert Stober, Sr. (Bright Computing)</td> |
| <td width="45%"> Bright Cluster Manager & SLURM: Benefits of Seamless Integration</td> |
| </tr> |
| <tr> |
| <td width="15%"> Session #8</td> |
| <td width="25%"> Morris Jette (SchedMD)</td> |
| <td width="45%"> Proposed Design for Job Step Management in User Space</td> |
| </tr> |
| |
| <tr> |
| <td width="15%" bgcolor="#F0F1C9">15:30 - 16:00</td> |
| <td width="85%" colspan="3" bgcolor="#F0F1C9"> Coffee break</td> |
| </tr> |
| |
| <tr> |
| <td width="15%" rowspan="3">16:00 - 17:30</td> |
| <td width="15%"> Session #9</td> |
| <td width="25%"> Don Lipari (LLNL)</td> |
| <td width="45%"> Proposed Design for Enhanced Enterprise-wide Scheduling</td> |
| </tr> |
| |
| <tr> |
| <td width="15%"> Session #10</td> |
| <td width="25%"> Danny Auble and Morris Jette (SchedMD)</td> |
| <td width="45%"> SLURM Version 2.3 and plans for future releases</td> |
| </tr> |
| |
| <tr> |
| <td width="85%" colspan="3"> Open discussion, feature requests, etc.</td> |
| </tr> |
| |
| </table> |
| |
| <br><br> |
| <a name="abstracts"><h1>Abstracts</h1></a> |
| |
| <h2>User Tutorial #1</h2> |
| SLURM Beginners Usage<br> |
| Don Albert and Rod Schultz (Bull) |
| <ul> |
| <li>Simple use of commands (submission/monitoring/result collection)</li> |
| <li>Reservations</li> |
| <li>Use of accounting and reporting</li> |
| <li>Scheduling techniques for smaller response time (setting of walltime for backfill , etc)</li> |
| </ul> |
| |
| <h2>User Tutorial #2</h2> |
| SLURM Advanced Usage<br> |
| Bill Brophy, Rod Schultz, Yiannis Georgiou (Bull) |
| <ul> |
| <li>MPI jobs</li> |
| <li>Checkpoint/Restart (BLCR or application level)</li> |
| <li>Preemption / Gang Scheduling Usage</li> |
| <li>Dynamic allocations (growing/shrinking)</li> |
| <li>Grace Time Delay with Preemption</li> |
| </ul> |
| |
| <h2>User Tutorial #3</h2> |
| Resource Management for multicore/multi-threaded usage<br> |
| Martin Perry and Yiannis Georgiou (Bull) |
| <ul> |
| <li>CPU allocation</li> |
| <li>CPU/tasks distribution</li> |
| <li>Task binding</li> |
| <li>Internals of the allocation procedures</li> |
| </ul> |
| |
| |
| <h2>Administrator Tutorial #1</h2> |
| SLURM High Availability<br> |
| David Egolf and Bill Brophy (Bull) |
| <ul> |
| <li>How to set up the High Availability SLURM</li> |
| <li>Event logging with striggers</li> |
| </ul> |
| |
| <h2>Administrator Tutorial #2</h2> |
| Power Management / Sview<br> |
| Dan Rusak (Bull) |
| <ul> |
| <li>Power Management configuration</li> |
| <li>sview presentation</li> |
| </ul> |
| |
| <h2>Administrator Tutorial #3</h2> |
| Accounting, limits and Priorities configurations<br> |
| Don Albert and Rod Schultz (Bull) |
| <ul> |
| <li>Accounting with slurmdbd configuration</li> |
| <li>Multifactor job priorities with examples considering all different factors</li> |
| <li>QOS configuration</li> |
| <li>Fairsharing setting</li> |
| </ul> |
| |
| <h2>Administrator Tutorial #4</h2> |
| Scalability, Scheduling and Task placement<br> |
| Matthieu Hautreux (CEA), Yiannis Georgiou and Martin Perry (Bull) |
| <ul> |
| <li>High Throughput Computing</li> |
| <li>Topology constraints config</li> |
| <li>Generic Resources and GPUs config</li> |
| <li>Task Placement with Cgroups</li> |
| </ul> |
| |
| <h2> Keynote Speaker</h2> |
| Challenges and Opportunities for Exscale Resource Management and how |
| Today's Petascale Systems are Guiding the Way<br> |
| William Kramer (NCSA)<br><br> |
| Resource management challenges currently experienced on the Blue Waters |
| computer will be described. These experiences will be extended to describe |
| the additional challenges faced in exascale and trans-petascale systems. |
| |
| <h2>Session #1</h2> |
| CEA Site report<br> |
| Matthieu Hautreux (CEA)<br><br> |
| Evolutions and feedback from Tera100. SLURM on Curie, the PRACE second Tier-0 |
| system that is planned to be installed by the end of the year in a new facility |
| hosted at CEA. Curie will be a 1.6 Petaflop system from Bull. |
| |
| <h2>Session #2</h2> |
| LLNL site report<br> |
| Don Lipari (LLNL)<br><br> |
| Don Lipari will provide an overview of the batch scheduling systems in use |
| at LLNL and an overview on how they are managed. |
| |
| <h2>Session #3</h2> |
| SLURM Simulator<br> |
| Alejandro Lucero Palau (BSC)<br><br> |
| Batch scheduling for high performance cluster installations has two main goals: |
| 1) to keep the whole machine working at full capacity at all times, and |
| 2) to respect priorities avoiding lower priority jobs jeopardizing higher |
| priority ones. Usually, batch schedulers allow different policies with |
| several variables to be tuned by policy. Other features like special job |
| requests, reservations or job preemption increase the complexity for achiev- |
| ing a fine-tuned algorithm. A local decision for a specific job can change |
| the full scheduling for a high number of jobs and what can be thought |
| as logical within a short term could make no sense for a long trace mea- |
| sured in weeks or months. Although it is possible to extract algorithms |
| from batch scheduling software to make simulations of large job traces, |
| this is not the ideal approach since scheduling is not an isolated part of |
| this type of tools and replicating same environment requires an important |
| effort plus a high maintenance cost. We present a method for obtaining a |
| special mode of operation for a real production-ready scheduling software, |
| SLURM, where we can simulate execution of real job traces to evaluate |
| impact of scheduling policies and policy tuning. |
| |
| <h2>Session #4</h2> |
| SLURM Operation on IBM BlueGene/Q<br> |
| Danny Auble (SchedMD)<br><br> |
| SLURM version 2.3 supports IBM BlueGene/Q. This presentation will report the |
| design and operation of SLURM with respect to BlueGene/Q systems. |
| |
| <h2>Session #5</h2> |
| SLURM Operation on Cray XT and XE systems<br> |
| Morris Jette (SchedMD)<br><br> |
| SLURM version 2.3 supports Cray XT and XE systems running over Cray's ALPS |
| (Application Level Placement Scheduler) resource manager. This presentation |
| will discuss the design and operation of SLURM with respect to Cray systems. |
| |
| <h2>Session #6</h2> |
| Introduction to SLURM DRMAA<br> |
| Mariusz Mamoński (Poznań University)<br><br> |
| DRMAA or Distributed Resource Management Application API is a high-level |
| Open Grid Forum API specification for the submission and control of jobs |
| in a Grid architecture. |
| |
| <h2>Session #7</h2> |
| Bright Cluster Manager & SLURM: Benefits of Seamless Integration<br> |
| Robert Stober, Sr. (Bright Computing)<br><br> |
| Bright Cluster Manager, tightly integrated with SLURM, simplifies HPC |
| cluster installation and management while boosting system throughput. Bright |
| automatically installs, configures and deploys SLURM so that clusters are |
| ready to use in minutes rather than days. Bright provides extensive and |
| extensible monitoring and management through its intuitive Bright Cluster |
| Manager GUI, powerful cluster management shell, and customizable web-based |
| user portal. |
| Additional integration benefits include sampling, analysis and visualization |
| of all key SLURM metrics from within the Bright GUI, automatic head node |
| failover, and extensive pre-job health checking capability. Regarding the |
| latter, say good-bye to the black hole node syndrome: Bright plus SLURM |
| effectively prevent this productivity-killing problem by identifying and |
| sidelining problematic nodes before the job is run. |
| |
| <h2>Session #8</h2> |
| Proposed Design for Job Step Management in User Space<br> |
| Morris Jette (SchedMD)<br><br> |
| SLURM currently creates and manages job steps using SLURM's control daemon, |
| slurmctld. Since some user jobs create thousands of job steps, the management |
| of those job steps accounts for most of slurmctld's work. It is possible to |
| move job step management from slurmctld into user space to improve SLURM |
| scalability and performance. A possible implementation of this will be |
| presented. |
| |
| <h2>Session #9</h2> |
| Proposed Design for Enhanced Enterprise-wide Scheduling<br> |
| Don Lipari (LLNL)<br><br> |
| SLURM currently supports the ability to submit and status jobs between |
| computers at site, however the current design has some limitations. When a job |
| is submitted with several possible computers usable for its execution, the |
| job is routed to the computer on which it is expected to start earliest. |
| Changes in the workload or system failures could make moving the job to another |
| computer result in faster initiation, but that is currently impossible. SLURM |
| is also unable to support dependencies between jobs executing on different |
| computers. The design of a SLURM meta-scheduler with enhanced enterprise-wide |
| scheduling capabilities will be presented. |
| |
| <h2>Session #10</h2> |
| Contents of SLURM Version 2.3 and plans for future releases<br> |
| Danny Auble and Morris Jette (SchedMD)<br><br> |
| An overview of the changes SLURM Version 2.3 will be presented along with |
| current plans for future releases. |
| |
| <h2>Open Discussion</h2> |
| All meeting attendees will be invited to provide input with respect to |
| SLURM's design and development work. |
| We also invite proposals for hosting the SLURM User Group Meeting in 2012. |
| |
| <!--#include virtual="footer.txt"--> |
| |