| .TH "knl.conf" "5" "Slurm Configuration File" "December 2022" "Slurm Configuration File" |
| |
| .SH "NAME" |
| knl.conf \- Slurm configuration file for Intel Knights Landing processor. |
| |
| .SH "DESCRIPTION" |
| This ASCII file which describes configuration information for Intel Knights |
| Landing processors and its name may depend upon the NodeFeatures plugin |
| configured in Slurm. For example, on Cray systems NodeFeatures should be configured |
| to "knl_cray" and its configuration file will be read from "knl_cray.conf". |
| The file will always be located in the same directory as the \fBslurm.conf\fR. |
| This file is optional. |
| .LP |
| Parameter names are case insensitive. |
| Any text following a "#" in the configuration file is treated |
| as a comment through the end of that line. |
| Changes to the configuration file take effect upon restart of |
| Slurm daemons, daemon receipt of the SIGHUP signal, or execution |
| of the command "scontrol reconfigure" unless otherwise noted. |
| .LP |
| The overall configuration parameters available include: |
| |
| .TP |
| \fBAllowMCDRAM\fR |
| Specify the MCDRAM modes which jobs are allowed to use. |
| This may be a subset of MCDRAM modes supported by the node. |
| If not specified, all MCDRAM modes supported by the node are available for use. |
| The comma separated list of allowed MCDRAM modes may include any of the modes |
| listed below. |
| .IP |
| .RS |
| .TP 17 |
| \fBcache\fR |
| All of MCDRAM to be used as cache. |
| .IP |
| |
| .TP |
| \fBequal\fR |
| MCDRAM to be used partly as cache and partly combined with primary memory. |
| .IP |
| |
| .TP |
| \fBflat\fR |
| MCDRAM to be combined with primary memory into a "flat" memory space. |
| .RE |
| .IP |
| |
| .TP |
| \fBAllowNUMA\fR |
| Specify the NUMA modes which jobs are allowed to use. |
| This may be a subset of NUMA modes supported by the node. |
| If not specified, all NUMA modes supported by the node are available for use. |
| The comma separated list of allowed NUMA modes may include any of the modes |
| listed below. |
| Note that Slurm can only support homogeneous nodes |
| (e.g. the same number of cores per NUMA node). |
| KNL scn4 and quad modes are not homogeneous, but each NUMA mode will have |
| either 16 or 18 cores. |
| This will result in Slurm using the lower core count and finding a total of |
| 256 threads rather than 272 threads and setting the node to a DOWN state. |
| Therefore it is recommended that snc4 and quad mode not be allowed at this time. |
| .IP |
| .RS |
| .TP 17 |
| \fBa2a\fR |
| All to all |
| .IP |
| |
| .TP |
| \fBsnc2\fR |
| Sub\-NUMA cluster 2 |
| .IP |
| |
| .TP |
| \fBsnc4\fR |
| Sub\-NUMA cluster 4 |
| .IP |
| |
| .TP |
| \fBhemi\fR |
| Hemisphere |
| .IP |
| |
| .TP |
| \fBquad\fR |
| Quadrant |
| .RE |
| .IP |
| |
| .TP |
| \fBAllowUserBoot\fR |
| A comma\-delimited list of users allowed to modify a node's MCDRAM or NUMA state. |
| If not specified then any user can change a node's state and reboot it. |
| .IP |
| |
| .TP |
| \fBBootTime\fR |
| Estimated time to reboot a node in seconds. |
| Used as a basis for optimizing scheduling decisions. |
| The default value is 300 seconds (5 minutes) for the "knl_generic" plugin |
| and 2700 seconds (45 minutes) for the "knl_cray" plugin. |
| .IP |
| |
| .TP |
| \fBCapmcPath\fR |
| Fully qualified path to the \fBcapmc\fR program. |
| The default value is "/opt/cray/capmc/default/bin/capmc". |
| This parameter is used only by the "knl_cray" plugin. |
| .IP |
| |
| .TP |
| \fBCapmcPollFreq\fR |
| Time interval between when the \fBcapmc\fR program should poll for node state |
| changes, in seconds. |
| The default value is 45 seconds. |
| This parameter is used only by the "knl_cray" plugin. |
| .IP |
| |
| .TP |
| \fBCapmcRetries\fR |
| Number of times to retry failed operations of the \fBcapmc\fR program. |
| Default value is 4. |
| .IP |
| |
| .TP |
| \fBCapmcTimeout\fR |
| Time limit for the \fBcapmc\fR program to return status information milliseconds. |
| The default value is 60000 milliseconds and the minimum value is 1000 milliseconds. |
| This parameter is used by the "knl_cray" plugin, plus the capmc_suspend and |
| capmc_resume programs used for suspending and resuming nodes. |
| .IP |
| |
| .TP |
| \fBCnselectPath\fR |
| Fully qualified path to the \fBcnselect\fR program. |
| The default value is "/opt/cray/sdb/default/bin/cnselect". |
| This parameter is used only by the "knl_cray" plugin. |
| .IP |
| |
| .TP |
| \fBDefaultMCDRAM\fR |
| Specify the default MCDRAM modes for job's which do not specify a value. |
| This is only used when a node is booted and the job which has been allocated |
| the node does not specify a desired MCDRAM mode. |
| The value can include one of the possible values identified with the |
| \fBAllowMCDRAM\fR configuration parameter above. |
| The default value is "cache". |
| .IP |
| |
| .TP |
| \fBDefaultNUMA\fR |
| Specify the default NUMA modes for job's which do not specify a value. |
| This is only used when a node is booted and the job which has been allocated |
| the node does not specify a desired NUMA mode. |
| The value can include one of the possible values identified with the |
| \fBAllowNUMA\fR configuration parameter above. |
| The default value is "a2a". |
| .IP |
| |
| .TP |
| \fBForce\fR |
| If set to a non\-zero value then load the node_features/generic plugin even |
| on non\-KNL nodes. |
| Used primarily for testing purposes. |
| .IP |
| |
| .TP |
| \fBLogFile\fR |
| Fully qualified path to a log file. |
| The default value is SlurmctldLogFile from the slurm.conf configuration file. |
| This is option is used only by the campc_suspend and campc_resume programs |
| (which power down and reboot nodes in the appropriate configuration). |
| .IP |
| |
| .TP |
| \fBMcPath\fR |
| Fully qualified path to memory controller device file directory. |
| Children of this directory with names of the form "mc#/csrow#/ue_count" |
| (i.e. the count of unrecoverable memory errors) will be monitored for non\-zero |
| values. |
| If such errors are detected, the node will be set to a DOWN state and the |
| slurmd daemon will shutdown. |
| The default value is "/sys/devices/system/edac/mc". |
| See also \fBUmeCheckInterval\fR. |
| .IP |
| |
| .TP |
| \fBNumaCpuBind\fR |
| Contains pairs of NUMA modes and the CpuBind mode to set a node to for that mode. |
| Any compute node found with or set to the specified NUMA mode will have that |
| node's CpuBind field set to the configured value. |
| The NUMA node will be followed by an equal sign the desired CpuBind mode for |
| that NUMA mode. Multiple NUMA mode and CpuBind modes should be in a semicolon |
| separated list. |
| By default changes to a node's NUMA mode will not effect that node's CpuBind |
| mode. |
| See the example below. |
| .IP |
| |
| .TP |
| \fBSyscfgPath\fR |
| Fully qualified path to Intel's \fBsyscfg\fR program, which identifies current |
| KNL configuration by viewing BIOS settings. |
| If not defined, the current BIOS setting will not be available. |
| The default value is "/usr/bin/syscfg". |
| This parameter is used only by the "knl_generic" plugin. |
| .IP |
| |
| .TP |
| \fBSyscfgTimeout\fR |
| Timeout for \fBsyscfg\fR program in milliseconds. |
| Default value is 1000 milliseconds. |
| For Dell KNL systems, experience has shown that a higher value of 10000 |
| milliseconds is more appropriate. |
| .IP |
| |
| .TP |
| \fBSystemType\fR |
| Used to distinguish the flavor of knl we are dealing with. |
| Possible options are "Dell" and "Intel". |
| The default value is "Intel". |
| This parameter is used only by the "knl_generic" plugin. |
| .IP |
| |
| .TP |
| \fBUmeCheckInterval\fR |
| Interval, in microseconds, between checks for Uncorrectable Memory Errors (UME). |
| If such errors are detected, the node will be set to a DOWN state and the |
| slurmd daemon will shutdown. |
| The default value is 0 (disabled). |
| See also \fBMcPath\fR. |
| .IP |
| |
| .TP |
| \fBValidateMode\fR |
| If set to 1 then validate, but do not modify the node's configured MCDRAM and |
| NUMA modes from the slurm.conf file. If the actual modes do not match configured |
| values the node will be set to a DOWN state. Every KNL nodes MCDRAM and NUMA |
| states must both be listed in the slurm.conf file. |
| This parameter is used only by the "knl_cray" plugin. |
| .IP |
| |
| .SH "EXAMPLE" |
| .nf |
| ################################################################### |
| # knl_cray.conf |
| # Slurm configuration file for Intel Knights Landing on Cray system |
| ################################################################### |
| CapmcPath=/opt/cray/capmc/default/bin/capmc |
| CapmcTimeout=6000 |
| DefaultMCDRAM=flat |
| DefaultNUMA=a2a |
| NumaCpuBind=a2a=core;snc2=thread;snc4=thread |
| LogFile=/var/tmp/slurm_node_feature.log |
| SyscfgPath=/usr/sbin/syscfg |
| .fi |
| |
| .SH "COPYING" |
| Copyright (C) 2015\-2022 SchedMD LLC. |
| .LP |
| This file is part of Slurm, a resource management program. |
| For details, see <https://slurm.schedmd.com/>. |
| .LP |
| Slurm is free software; you can redistribute it and/or modify it under |
| the terms of the GNU General Public License as published by the Free |
| Software Foundation; either version 2 of the License, or (at your option) |
| any later version. |
| .LP |
| Slurm is distributed in the hope that it will be useful, but WITHOUT ANY |
| WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS |
| FOR A PARTICULAR PURPOSE. See the GNU General Public License for more |
| details. |
| |
| .SH "SEE ALSO" |
| .LP |
| \fBslurm.conf\fR(5) |