| <!--#include virtual="header.txt"--> |
| |
| <h1>Power Saving Guide</h1> |
| <p>SLURM provides an integrated power saving mechanism beginning with |
| version 1.2.7. |
| Nodes that remain idle for an configurable period of time can be placed |
| in a power saving mode. |
| The nodes will be restored to normal operation once work is assigned to them. |
| Power saving is accomplished using a <i>cpufreq</i> governor that can change |
| CPU frequency and voltage. |
| Note that the <i>cpufreq</i> driver must be enabled in the Linux kernel |
| configuration. |
| While the "ondemand" governor can be configured to operate at all |
| times to automatically alter the CPU performance based upon workload, |
| SLURM provides somewhat greater flexibility for power management on a |
| cluster. |
| Of particular note, SLURM can alter the governors across the cluster |
| at a configurable rate to prevent rapid changes in power demands. |
| For example, starting a 1000 node job on an idle cluster could result |
| in an instantaneous surge in power demand of multiple megawatts without |
| SLURM's support to increase power demands in a gradual fashion.</p> |
| |
| |
| <h2>Configuration</h2> |
| <p>Rather than changing SLURM's configuration file (and data |
| structures) after SLURM version 1.2 was released, we decided to |
| temporarily put the configuration parameters directly in the |
| <i>src/slurmctld/power_save.c</i> file. |
| These paramters will all be moved into the <i>slurm.conf</i> |
| configuration file when SLURM version 1.3 is released. |
| Until that time, pleased directly edit the code to use this feature. |
| The following configuration paramters are available: |
| <ul> |
| <li><b>IdleTime</b>: |
| Nodes becomes elligible for power saving mode after being idle |
| for this number of seconds. |
| A negative number disables power saving mode. |
| The default value is -1 (disabled).</li> |
| <li><b>SuspendRate</b>: |
| Maximum number of nodes to be placed into power saving mode |
| per minute. |
| A value of zero results in no limits being imposed. |
| The default value is 60. |
| Use this to prevent rapid drops in power requirements.</li> |
| <li><b>ResumeRate</b>: |
| Maximum number of nodes to be placed into power saving mode |
| per minute. |
| A value of zero results in no limits being imposed. |
| The default value is 60. |
| Use this to prevent rapid increasses in power requirements.</li> |
| <li><b>SuspendProgram</b>: |
| Program to be executed to place nodes into power saving mode. |
| The program executes as <i>SlurmUser</i> (as configured in |
| <i>slurm.conf</i>. |
| The argument to the program will be the names of nodes to |
| be placed into power savings mode (using SLURM's hostlist |
| expression format).</li> |
| <li><b>ResumeProgram</b>: |
| Program to be executed to remove nodes from power saving mode. |
| The program executes as <i>SlurmUser</i> (as configured in |
| <i>slurm.conf</i>. |
| The argument to the program will be the names of nodes to |
| be removed from power savings mode (using SLURM's hostlist |
| expression format).</li> |
| <li><b>ExcludeSuspendNodes</b>: |
| List of nodes to never place in power saving mode. |
| Use SLURM's hostlist expression format. |
| By default, no nodes are excluded.</li> |
| <li><b>ExcludeSuspendPartitions</b>: |
| List of partitions with nodes to never place in power saving mode. |
| Multiple partitions may be specified using a comma separator. |
| By default, no nodes are excluded.</li> |
| </ul></p> |
| |
| <p>While <i>SuspendProgram</i> and <i>ResumeProgram</i> execute as |
| <i>SlurmUser</i>. The program can take advantage of this to execute |
| programs directly on the nodes as user <i>root</i> through the |
| SLURM infrastructure. |
| Example scripts are shown below: |
| <pre> |
| #!/bin/bash |
| # Example SuspendProgram for cluster where every node has two CPUs |
| srun --uid=0 --no-allocate --nodelist=$1 echo powersave >/sys/devices/system/cpu0/cpufreq |
| srun --uid=0 --no-allocate --nodelist=$1 echo powersave >/sys/devices/system/cpu1/cpufreq |
| |
| #!/bin/bash |
| # Example ResumeProgram for cluster where every node has two CPUs |
| srun --uid=0 --no-allocate --nodelist=$1 echo performance >/sys/devices/system/cpu0/cpufreq |
| srun --uid=0 --no-allocate --nodelist=$1 echo performance >/sys/devices/system/cpu1/cpufreq |
| </pre> |
| |
| <p>The srun --no-allocate option permits SlurmUser and user root only to spawn |
| tasks directly on the compute nodes without actually creating a SLURM job. |
| No other users have this permission (their requests will generate an invalid |
| credential error message and the event will be logged). |
| The srun --uid option permits SlurmUser and user root only to execute a job |
| as some other user. |
| Then SlurmUser uses the srun --uid option, the srun command will try to set |
| its user ID to that value in order to fully operate as the specified user. |
| This will fail and srun will report an error to that effect. |
| This does not prevent the spawned programs from running as user root. |
| No other users have this permission (their requests will generate an invalid |
| user id error message and the event will be logged).</p> |
| |
| <p>The slurmctld daemon will periodically (every 10 minutes) log how many |
| nodes are in power save mode using messages of this sort: |
| <pre> |
| [May 02 15:31:25] Power save mode 0 nodes |
| ... |
| [May 02 15:41:26] Power save mode 10 nodes |
| ... |
| [May 02 15:51:28] Power save mode 22 nodes |
| </pre> |
| <p>Using these logs you can easily see the effect of SLURM's power saving support. |
| You can also configure SLURM without SuspendProgram or ResumeProgram values |
| to assess the potential impact of power saving mode before enabling it.</p> |
| |
| <p style="text-align:center;">Last modified 9 May 2007</p> |
| |
| <!--#include virtual="footer.txt"--> |