blob: be88a0e1e6f45e2f585ad17d8ce37eaba93d45db [file] [log] [blame] [edit]
.TH "bluegene.conf" "5" "August 2011" "bluegene.conf 2.3" "SLURM configuration file"
.SH "NAME"
bluegene.conf \- SLURM configuration file for BlueGene systems
.SH "DESCRIPTION"
\fBbluegene.conf\fP is an ASCII file which describes IBM BlueGene specific
SLURM configuration information. This includes specifications for bgblock
layout, configuration, logging, etc.
The file location can be modified at system build time using the
DEFAULT_SLURM_CONF parameter or at execution time by setting the SLURM_CONF
environment variable. The file will always be located in the
same directory as the \fBslurm.conf\fP file.
.LP
Parameter names are case insensitive.
Any text following a "#" in the configuration file is treated
as a comment through the end of that line.
Changes to the configuration file take only effect upon restart of
the slurmctld daemon. "scontrol reconfigure" does nothing with this file.
Changes will only take place after a restart of the controller.
.LP
There are some differences between BlueGene/L, BlueGene/P and BlueGene/Q
systems with respects to the contents of the bluegene.conf file.
.SH "The BlueGene/L specific options are:"
.TP
\fBAltBlrtsImage\fR
Alternative BlrtsImage. This is an optional field only used for multiple
images on a system and should be followed by a Groups option indicating
the user groups allowed to use this image (i.e. Groups=da,jette). If
Groups is not specified then this image will be usable by all
groups. You can put as many alternative images as you want in the
bluegene.conf file.
.TP
\fBAltLinuxImage\fR
Alternative LinuxImage. This is an optional field only used for multiple
images on a system and should be followed by a Groups option indicating
the user groups allowed to use this image (i.e. Groups=da,jette). If
Groups is not specified then this image will be usable by all
groups. You can put as many alternative images as you want in the
bluegene.conf file.
.TP
\fBAltRamDiskImage\fR
Alternative RamDiskImage. This is an optional field only used for multiple
images on a system and should be followed by a Groups option indicating
the user groups allowed to use this image (i.e. Groups=da,jette). If
Groups is not specified then this image will be usable by all
groups. You can put as many alternative images as you want in the
bluegene.conf file.
.TP
\fBBlrtsImage\fR
BlrtsImage used for creation of all bgblocks.
There is no default value and this must be specified.
.TP
\fBLinuxImage\fR
LinuxImage used for creation of all bgblocks.
There is no default value and this must be specified.
.TP
\fBRamDiskImage\fR
RamDiskImage used for creation of all bgblocks.
There is no default value and this must be specified.
.SH "The BlueGene/P specific options are:"
.TP
\fBAltCnloadImage\fR
Alternative CnloadImage. This is an optional field only used for multiple
images on a system and should be followed by a Groups option indicating
the user groups allowed to use this image (i.e. Groups=da,jette). If
Groups is not specified then this image will be usable by all
groups. You can put as many alternative images as you want in the conf file.
.TP
\fBAltIoloadImage\fR
Alternative IoloadImage. This is an optional field only used for multiple
images on a system and should be followed by a Groups option indicating
the user groups allowed to use this image (i.e. Groups=da,jette). If
Groups is not specified then this image will be usable by all
groups. You can put as many alternative images as you want in the conf file.
.TP
\fBCnloadImage\fR
CnloadImage used for creation of all bgblocks.
There is no default value and this must be specified.
.TP
\fBIoloadImage\fR
IoloadImage used for creation of all bgblocks.
There is no default value and this must be specified.
.SH "The BlueGene/Q specific options are:"
.TP
\fBAllowSubBlockAllocations\fR
Can be set to Yes or No, defaults to No. This option allows multiple users to
run jobs as small as 1 cnode in size on a block one midplane in size and
smaller. While this option gives great flexibility to run a host of job
sizes previously not available on any BlueGene system it also may cause
security concerns since IO traffic can share the same path with other jobs.
NOTE - There is a current limitation for sub-block jobs and how the system
(used for I/O) and user (used for MPI) torus class routes are configured. The
network device hardware has cutoff registers to prevent packets from flowing
outside of the sub-block. Unfortunately, when the sub-block has a size 3,
the job can attempt to send user packets outside of its sub-block. This causes
it to be terminated by signal 36. To prevent this from happening SLURM does
not allow a sub-block to be used with any dimension of 3.
NOTE - In the current IBM API it does not allow wrapping inside a midplane.
Meaning you can not create a sub-block of 2 with nodes in the 0 and 3 position.
SLURM will support this in the future when the underlying system allows it.
.SH "All options below are common on all BlueGene systems:"
.TP
\fBAltMloaderImage\fR
Alternative MloaderImage. This is an optional field only used for multiple
images on a system and should be followed by a Groups option indicating
the user groups allowed to use this image (i.e. Groups=da,jette). If
Groups is not specified then this image will be usable by all
groups. You can put as many alternative images as you want in the conf file.
.TP
\fBBridgeAPILogFile\fR
Fully qualified pathname of a into which the Bridge API logs are
to be written.
There is no default value.
.TP
\fBBridgeAPIVerbose\fR
Specify how verbose the Bridge API logs should be.
The default value is 0.
.RS
.TP
\fB0\fR: Log only error and warning messages
.TP
\fB1\fR: Log level 0 plus information messages
.TP
\fB2\fR: Log level 1 plus basic debug messages
.TP
\fB3\fR: Log level 2 plus more debug message
.TP
\fB4\fR: Log all messages
.RE
.TP
\fBDefaultConnType\fR
Specify the default Connection Type(s) to be used when generating new blocks
in Dynamic LayoutMode. The default value is TORUS. On a BGQ system you can
specify a different connection type for each dimension. (i.e. T,T,T,M would
make the default be torus in all dimensions except Z where it would be mesh)
NOTE - If a block is requested that can use all the midplanes in a dimension
torus will always be used.
.TP
\fBDenyPassthrough\fR
Specify which dimensions you do not want to allow pass\-throughs.
Valid options are A, X, Y, Z or all ("A" applies only to BlueGene/Q systems).
For example, to prevent pass\-throughs in the X and Y dimensions you would
specify "DenyPassthrough=X,Y".
By default, pass\-throughs are enabled in every dimension.
.TP
\fBIONodesPerMP\fR
The number of IO nodes on a midplane. This number must be the smallest
number if you have a heterogeneous system.
There is no default value and this must be specified. The typical settings
for BlueGene/L systems are as follows: For IO rich systems, 64 is the value that
should be used to create small blocks. For systems that are not IO rich, or
for which small blocks are not desirable, 8 is usually the number to use.
For BlueGene/P IO rich systems, 32 is the value that should be used to create
small blocks since there are only 2 IO nodes per nodecard instead of 4 as on
BlueGene/L.
.TP
\fBLayoutMode\fR
Describes how SLURM should create bgblocks.
.RS
.TP 10
\fBSTATIC\fR:
Create and use the defined non\-overlapping bgblocks.
.TP
\fBOVERLAP\fR:
Create and use the defined bgblocks, which may overlap.
It is highly recommended that none of the bgblocks have any passthroughs
in the X\-dimension on BGL and BGP systems.
\fBUse this mode with extreme caution.\fR
.TP
\fBDYNAMIC\fR:
Create and use bgblocks as needed for each job.
Bgblocks will not be defined in the bluegene.conf file.
Dynamic partitioning may introduce fragmentation of resources.
\fBUse this mode with mild caution.\fR
.RE
.TP
\fBMaxBlockInError\fR
MaxBlockInError is used on BGQ systems to specify the percentage of a block
allowed in an error state before no future jobs are allowed. Since cnodes can
go into Software Failure and allow the block to not fail this option is used
when allowing multiple jobs to run on a block and once the percentage of cnodes
in that block breach this limit no future jobs will be allowed to be run on
the block. After all jobs are finished on the block the block is freed which
will resolve any cnodes in an error state. Default is 0, which means once
any cnodes are in an error state disallow future jobs.
.TP
\fBMidplaneNodeCnt\fR
The number of c\-nodes (compute nodes) per midplane.
There is no default value and this must be specified (usually 512).
.TP
\fBMloaderImage\fR
MloaderImage used for creation of all bgblocks.
There is no default value and this must be specified.
.TP
\fBNodeCardNodeCnt\fR or \fBNodeBoardNodeCnt\fR
Number of c\-nodes per nodecard / nodeboard.
There is no default value and this must be specified. For most BlueGene systems
this is usually 32.
.TP
\fBSubMidplaneSystem\fR
Set to Yes if this system is not a full midplane in size, Default is No
(regular system).
.LP
Each bgblock is defined by the midplanes used to construct it.
Ordering is very important for laying out switch wires. Please use the smap
tool to define blocks and do not change the order of blocks created.
A bgblock is implicitly created containing all resources on the system.
Bgblocks must not overlap in static mode (except for implicitly
created bgblock). This will be the case when smap is used to create
a configuration file
All Nodes defined here must also be defined in the slurm.conf file.
Define only the numeric coordinates of the bgblocks here. The prefix
will be based upon the NodeName defined in slurm.conf
.TP
\fBMPs\fR
Define the coordinates of the bgblock end points.
For BlueGene/L and BlueGene/P systems there will be three coordinates (X, Y, and Z).
For BlueGene/Q systems there will be for coordinates (A, X, Y, and Z).
.TP
\fBType\fR
Define the network connection type for the bgblock.
The default value is TORUS. On a BGQ system you can
specify a different connection type for each dimension. (i.e. T,T,T,M would
make the default be torus in all dimensions except Z where it would be mesh)
NOTE - If a block is requested that can use all the midplanes in a dimension
torus will always be used.
.RS
.TP 8
\fBMESH\fR:
Communication occur over a mesh.
.TP
\fBSMALL\fR:
The midplane is divided into more than one bgblock.
The administrator should define the number of single nodecards and
quarter midplane blocks using the options \fB32CNBlocks\fR and
\fB128CNBlocks\fR respectively for a BlueGene/L system. \fB64CNBlocks\fR,
and \fB256CNBlocks\fR are also available for later BlueGene systems.
\fB16CNBlocks\fR is also valid on BlueGene/P systems. Keep in mind you
must have at keast one IO node per block. So if you only have 4 ionodes per
midplane the smallest block you will be able to make is 128 c-nodes.
The total number of c\-nodes of the blocks in a small request must not exceed
\fBMidplaneNodeCnt\fR.
If none are specified, the midplane will be divided into four 128 c-node blocks.
See example below.
.TP
\fBTORUS\fR:
Communications occur over a torus (end\-points of network directly connect.
.RE
.SH "EXAMPLE"
.LP
.br
##################################################################
.br
# bluegene.conf for a Bluegene/L system
.br
# build by smap on 03/06/2006
.br
##################################################################
.br
BridgeAPILogFile=/var/log/slurm/bridgeapi.log
.br
BridgeAPIVerbose=2
.br
BlrtsImage=/bgl/BlueLight/ppcfloor/bglsys/bin/rts_hw.rts
.br
LinuxImage=/bgl/BlueLight/ppcfloor/bglsys/bin/zImage.elf
.br
MloaderImage=/bgl/BlueLight/ppcfloor/bglsys/bin/mmcs\-mloader.rts
.br
RamDiskImage=/bgl/BlueLight/ppcfloor/bglsys/bin/ramdisk.elf
.br
MidplaneNodeCnt=512
.br
NodeCardNodeCnt=32
.br
IONodesPerMP=64 # An I/O rich environment
.br
LayoutMode=STATIC
.br
##################################################################
.br
# LEAVE AS COMMENT, Full\-system bgblock, implicitly created
.br
# BPs=[000x333] Type=TORUS # 4x4x4 = 64 midplanes
.br
##################################################################
.br
BPs=[000x133] Type=TORUS # 2x4x4 = 32
.br
BPs=[200x233] Type=TORUS # 1x4x4 = 16
.br
BPs=[300x313] Type=TORUS # 1x2x4 = 8
.br
BPs=[320x323] Type=TORUS # 1x1x4 = 4
.br
BPs=[330x331] Type=TORUS # 1x1x2 = 2
.br
BPs=[332] Type=TORUS # 1x1x1 = 1
.br
BPs=[333] Type=SMALL 32CNBlocks=4 128CNBlocks=3 # 32 * 4 + 128 * 3 = 512
.SH "COPYING"
Copyright (C) 2006-2010 The Regents of the University of California.
Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
CODE\-OCEC\-09\-009. All rights reserved.
.LP
This file is part of SLURM, a resource management program.
For details, see <http://slurm.schedmd.com/>.
.LP
SLURM is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free
Software Foundation; either version 2 of the License, or (at your option)
any later version.
.LP
SLURM is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
details.
.SH "FILES"
/etc/bluegene.conf
.SH "SEE ALSO"
.LP
\fBsmap\fR(1), \fBslurm.conf\fR(5)