| .TH sbatch "1" "Slurm Commands" "August 2025" "Slurm Commands" |
| |
| .SH "NAME" |
| sbatch \- Submit a batch script to Slurm. |
| |
| .SH "SYNOPSIS" |
| \fBsbatch\fR [\fIOPTIONS(0)\fR...] [ : [\fIOPTIONS(N)\fR...]] \fIscript(0)\fR [\fIargs(0)\fR...] |
| |
| Option(s) define multiple jobs in a co\-scheduled heterogeneous job. |
| For more details about heterogeneous jobs see the document |
| .br |
| https://slurm.schedmd.com/heterogeneous_jobs.html |
| |
| .SH "DESCRIPTION" |
| sbatch submits a batch script to Slurm. The batch script may be given to |
| sbatch through a file name on the command line, or if no file name is specified, |
| sbatch will read in a script from standard input. |
| |
| The batch script may contain one or more lines beginning with "#SBATCH" followed |
| by any of the CLI options documented on this page. #SBATCH directives are read |
| directly by Slurm, so shell\-specific syntax including variable names will be |
| read as literal text. Once the first non\-comment, non\-whitespace line has been |
| reached in the script, no more #SBATCH directives will be processed. See example |
| below. |
| |
| sbatch exits immediately after the script is successfully transferred to the |
| Slurm controller and assigned a Slurm job ID. The batch script is not |
| necessarily granted resources immediately, it may sit in the queue of pending |
| jobs for some time before its required resources become available. |
| |
| By default both standard output and standard error are directed to a file of |
| the name "slurm\-%j.out", where the "%j" is replaced with the job allocation |
| number. The file will be generated on the first node of the job allocation. |
| Other than the batch script itself, Slurm does no movement of user files. |
| |
| When the job allocation is finally granted for the batch script, Slurm |
| runs a single copy of the batch script on the first node in the set of |
| allocated nodes. |
| |
| The following document describes the influence of various options on the |
| allocation of cpus to jobs and tasks. |
| .br |
| https://slurm.schedmd.com/cpu_management.html |
| |
| .SH "RETURN VALUE" |
| sbatch will return 0 on success or error code on failure. |
| |
| .SH "SCRIPT PATH RESOLUTION" |
| |
| The batch script is resolved in the following order: |
| .br |
| |
| 1. If script starts with ".", then path is constructed as: |
| current working directory / script |
| .br |
| 2. If script starts with a "/", then path is considered absolute. |
| .br |
| 3. If script is in current working directory. |
| .br |
| 4. If script can be resolved through PATH. See \fBpath_resolution\fR(7). |
| .br |
| .P |
| Current working directory is the calling process working directory unless the |
| \fB\-\-chdir\fR argument is passed, which will override the current working |
| directory. |
| |
| .SH "OPTIONS" |
| .LP |
| |
| .TP |
| \fB\-A\fR, \fB\-\-account\fR=<\fIaccount\fR> |
| Charge resources used by this job to specified account. |
| The \fIaccount\fR is an arbitrary string. The account name may |
| be changed after job submission using the \fBscontrol\fR |
| command. |
| .IP |
| |
| .TP |
| \fB\-\-acctg\-freq\fR=<\fIdatatype\fR>=<\fIinterval\fR>[,<\fIdatatype\fR>=<\fIinterval\fR>...] |
| Define the job accounting and profiling sampling intervals in seconds. |
| This can be used to override the \fIJobAcctGatherFrequency\fR parameter in |
| the slurm.conf file. <\fIdatatype\fR>=<\fIinterval\fR> specifies the task |
| sampling interval for the jobacct_gather plugin or a |
| sampling interval for a profiling type by the |
| acct_gather_profile plugin. Multiple |
| comma\-separated <\fIdatatype\fR>=<\fIinterval\fR> pairs |
| may be specified. Supported \fIdatatype\fR values are: |
| .IP |
| .RS |
| .TP 12 |
| \fBtask\fR |
| Sampling interval for the jobacct_gather plugins and for task |
| profiling by the acct_gather_profile plugin. |
| .br |
| \fBNOTE\fR: This frequency is used to monitor memory usage. If memory limits |
| are enforced, the highest frequency a user can request is what is configured |
| in the slurm.conf file. It can not be disabled. |
| .IP |
| |
| .TP |
| \fBenergy\fR |
| Sampling interval for energy profiling using the |
| acct_gather_energy plugin. |
| .IP |
| |
| .TP |
| \fBnetwork\fR |
| Sampling interval for infiniband profiling using the |
| acct_gather_interconnect plugin. |
| .IP |
| |
| .TP |
| \fBfilesystem\fR |
| Sampling interval for filesystem profiling using the |
| acct_gather_filesystem plugin. |
| .IP |
| |
| .LP |
| The default value for the task sampling interval is 30 seconds. |
| The default value for all other intervals is 0. |
| An interval of 0 disables sampling of the specified type. |
| If the task sampling interval is 0, accounting |
| information is collected only at job termination (reducing Slurm |
| interference with the job). |
| .br |
| Smaller (non\-zero) values have a greater impact upon job performance, |
| but a value of 30 seconds is not likely to be noticeable for |
| applications having less than 10,000 tasks. |
| .RE |
| .IP |
| |
| .TP |
| \fB\-a\fR, \fB\-\-array\fR=<\fIindexes\fR> |
| Submit a job array, multiple jobs to be executed with identical parameters. |
| The \fIindexes\fR specification identifies what array index values should |
| be used. Multiple values may be specified using a comma separated list and/or |
| a range of values with a "\-" separator. For example, "\-\-array=0\-15" or |
| "\-\-array=0,6,16\-32". |
| A step function can also be specified with a suffix containing a colon and |
| number. For example, "\-\-array=0\-15:4" is equivalent to "\-\-array=0,4,8,12". |
| A maximum number of simultaneously running tasks from the job array may be |
| specified using a "%" separator. |
| For example "\-\-array=0\-15%4" will limit the number of simultaneously |
| running tasks from this job array to 4. |
| The minimum index value is 0. |
| the maximum value is one less than the configuration parameter MaxArraySize. |
| \fBNOTE\fR: Currently, federated job arrays only run on the local cluster. |
| .IP |
| |
| .TP |
| \fB\-\-batch\fR=<\fIlist\fR> |
| Nodes can have \fBfeatures\fR assigned to them by the Slurm administrator. |
| Users can specify which of these \fBfeatures\fR are required by their batch |
| script using this options. |
| For example a job's allocation may include both Intel Haswell and KNL nodes |
| with features "haswell" and "knl" respectively. |
| On such a configuration the batch script would normally benefit by executing |
| on a faster Haswell node. |
| This would be specified using the option "\-\-batch=haswell". |
| The specification can include AND and OR operators using the ampersand and |
| vertical bar separators. For example: |
| "\-\-batch=haswell|broadwell" or "\-\-batch=haswell|big_memory". |
| The \-\-batch argument must be a subset of the job's |
| \fB\-\-constraint\fR=<\fIlist\fR> argument (i.e. the job can not request only |
| KNL nodes, but require the script to execute on a Haswell node). |
| If the request can not be satisfied from the resources allocated to the job, |
| the batch script will execute on the first node of the job allocation. |
| .IP |
| |
| .TP |
| \fB\-\-bb\fR=<\fIspec\fR> |
| Burst buffer specification. The form of the specification is system dependent. |
| Also see \fB\-\-bbf\fR. |
| When the \fB\-\-bb\fR option is used, Slurm parses this option and creates a |
| temporary burst buffer script file that is used internally by the burst buffer |
| plugins. See Slurm's burst buffer guide for more information and examples: |
| .br |
| https://slurm.schedmd.com/burst_buffer.html |
| .IP |
| |
| .TP |
| \fB\-\-bbf\fR=<\fIfile_name\fR> |
| Path of file containing burst buffer specification. |
| The form of the specification is system dependent. |
| These burst buffer directives will be inserted into the submitted batch script. |
| See Slurm's burst buffer guide for more information and examples: |
| .br |
| https://slurm.schedmd.com/burst_buffer.html |
| .IP |
| |
| .TP |
| \fB\-b\fR, \fB\-\-begin\fR=<\fItime\fR> |
| Submit the batch script to the Slurm controller immediately, like normal, but |
| tell the controller to defer the allocation of the job until the specified time. |
| |
| Time may be of the form \fIHH:MM:SS\fR to run a job at |
| a specific time of day (seconds are optional). |
| (If that time is already past, the next day is assumed.) |
| You may also specify \fImidnight\fR, \fInoon\fR, \fIelevenses\fR (11 AM), |
| \fIfika\fR (3 PM) or \fIteatime\fR (4 PM) and you can have a time\-of\-day |
| suffixed with \fIAM\fR or \fIPM\fR for running in the morning or the evening. |
| You can also say what day the job will be run, by specifying |
| a date of the form \fIMMDDYY\fR or \fIMM/DD/YY\fR |
| \fIYYYY\-MM\-DD\fR. Combine date and time using the following |
| format \fIYYYY\-MM\-DD[THH:MM[:SS]]\fR. You can also |
| give times like \fInow + count time\-units\fR, where the time\-units |
| can be \fIseconds\fR (default), \fIminutes\fR, \fIhours\fR, |
| \fIdays\fR, or \fIweeks\fR and you can tell Slurm to run |
| the job today with the keyword \fItoday\fR and to run the |
| job tomorrow with the keyword \fItomorrow\fR. |
| The value may be changed after job submission using the |
| \fBscontrol\fR command. |
| For example: |
| .IP |
| .nf |
| \-\-begin=16:00 |
| \-\-begin=now+1hour |
| \-\-begin=now+60 (seconds by default) |
| \-\-begin=2010\-01\-20T12:34:00 |
| .fi |
| |
| .RS |
| .PP |
| Notes on date/time specifications: |
| \- Although the 'seconds' field of the HH:MM:SS time specification is |
| allowed by the code, note that the poll time of the Slurm scheduler |
| is not precise enough to guarantee dispatch of the job on the exact |
| second. The job will be eligible to start on the next poll |
| following the specified time. The exact poll interval depends on the |
| Slurm scheduler (e.g., 60 seconds with the default sched/builtin). |
| \- If no time (HH:MM:SS) is specified, the default is (00:00:00). |
| \- If a date is specified without a year (e.g., MM/DD) then the current |
| year is assumed, unless the combination of MM/DD and HH:MM:SS has |
| already passed for that year, in which case the next year is used. |
| .RE |
| .IP |
| |
| .TP |
| \fB\-D\fR, \fB\-\-chdir\fR=<\fIdirectory\fR> |
| Set the working directory of the batch script to \fIdirectory\fR before |
| it is executed. The path can be specified as full path or relative path |
| to the directory where the command is executed. |
| .IP |
| |
| .TP |
| \fB\-\-cluster\-constraint\fR=[!]<\fIlist\fR> |
| Specifies features that a federated cluster must have to have a sibling job |
| submitted to it. Slurm will attempt to submit a sibling job to a cluster if it |
| has at least one of the specified features. If the "!" option is included, Slurm |
| will attempt to submit a sibling job to a cluster that has none of the specified |
| features. |
| .IP |
| |
| .TP |
| \fB\-M\fR, \fB\-\-clusters\fR=<\fIstring\fR> |
| Clusters to issue commands to. Multiple cluster names may be comma separated. |
| The job will be submitted to the one cluster providing the earliest expected |
| job initiation time. The default value is the current cluster. A value of |
| \(aq\fIall\fR' will query to run on all clusters. Note the |
| \fB\-\-export\fR option to control environment variables exported |
| between clusters. |
| Note that the \fBslurmdbd\fR must be up for this option to work properly, unless |
| running in a federation with \fBFederationParameters=fed_display\fR configured. |
| .IP |
| |
| .TP |
| \fB\-\-comment\fR=<\fIstring\fR> |
| An arbitrary comment enclosed in double quotes if using spaces or some |
| special characters. |
| .IP |
| |
| .TP |
| \fB\-C\fR, \fB\-\-constraint\fR=<\fIlist\fR> |
| Nodes can have \fBfeatures\fR assigned to them by the Slurm administrator. |
| Users can specify which of these \fBfeatures\fR are required by their job |
| using the constraint option. If you are looking for 'soft' constraints please |
| see \fB\-\-prefer\fR for more information. |
| Only nodes having features matching the job constraints will be used to |
| satisfy the request. |
| Multiple constraints may be specified with AND, OR, matching OR, |
| resource counts, etc. (some operators are not supported on all system types). |
| |
| \fBNOTE\fR: Changeable features are features defined by a NodeFeatures plugin. |
| |
| Supported \fB\-\-constraint\fR options include: |
| .IP |
| .PD 1 |
| .RS |
| .TP |
| \fBSingle Name\fR |
| Only nodes which have the specified feature will be used. |
| For example, \fB\-\-constraint="intel"\fR |
| .IP |
| |
| .TP |
| \fBNode Count\fR |
| A request can specify the number of nodes needed with some feature |
| by appending an asterisk and count after the feature name. |
| For example, \fB\-\-nodes=16 \-\-constraint="graphics*4"\fR |
| indicates that the job requires 16 nodes and that at least four of those |
| nodes must have the feature "graphics." |
| If requesting more than one feature and using node counts, the request |
| must have square brackets surrounding it. |
| |
| \fBNOTE\fR: This option is not supported by the helpers NodeFeatures plugin. |
| Heterogeneous jobs can be used instead. |
| .IP |
| |
| .TP |
| \fBAND\fR |
| Only nodes with all of specified features will be used. |
| The ampersand is used for an AND operator. |
| For example, \fB\-\-constraint="intel&gpu"\fR |
| .IP |
| |
| .TP |
| \fBOR\fR |
| Only nodes with at least one of specified features will be used. |
| The vertical bar is used for an OR operator. If changeable features are not |
| requested, nodes in the allocation can have different features. For example, |
| \fBsalloc -N2 \-\-constraint="intel|amd"\fR can result in a job allocation |
| where one node has the intel feature and the other node has the amd feature. |
| However, if the expression contains a changeable feature, then all OR operators |
| are automatically treated as Matching OR so that all nodes in the job |
| allocation have the same set of features. For example, |
| \fBsalloc -N2 \-\-constraint="foo|bar&baz"\fR |
| The job is allocated two nodes where both nodes have foo, or bar and baz (one |
| or both nodes could have foo, bar, and baz). The helpers NodeFeatures plugin |
| will find the first set of node features that matches all nodes in the job |
| allocation; these features are set as active features on the node and passed to |
| RebootProgram (see \fBslurm.conf\fR(5)) and the helper script (see |
| \fBhelpers.conf\fR(5)). In this case, the helpers plugin uses the first of |
| "foo" or "bar,baz" that match the two nodes in the job allocation. |
| .IP |
| |
| .TP |
| \fBMatching OR\fR |
| If only one of a set of possible options should be used for all allocated |
| nodes, then use the OR operator and enclose the options within square brackets. |
| For example, \fB\-\-constraint="[rack1|rack2|rack3|rack4]"\fR might |
| be used to specify that all nodes must be allocated on a single rack of |
| the cluster, but any of those four racks can be used. |
| .IP |
| |
| .TP |
| \fBMultiple Counts\fR |
| Specific counts of multiple resources may be specified by using the AND |
| operator and enclosing the options within square brackets. |
| For example, \fB\-\-constraint="[rack1*2&rack2*4]"\fR might |
| be used to specify that two nodes must be allocated from nodes with the feature |
| of "rack1" and four nodes must be allocated from nodes with the feature |
| "rack2". |
| |
| \fBNOTE\fR: This construct does not support multiple Intel KNL NUMA or MCDRAM |
| modes. For example, while \fB\-\-constraint="[(knl&quad)*2&(knl&hemi)*4]"\fR is |
| not supported, \fB\-\-constraint="[haswell*2&(knl&hemi)*4]"\fR is supported. |
| Specification of multiple KNL modes requires the use of a heterogeneous job. |
| |
| \fBNOTE\fR: This option is not supported by the helpers NodeFeatures plugin. |
| |
| \fBNOTE\fR: Multiple Counts can cause jobs to be allocated with a non-optimal |
| network layout. |
| .IP |
| |
| .TP |
| \fBBrackets\fR |
| Brackets can be used to indicate that you are looking for a set of nodes with |
| the different requirements contained within the brackets. For example, |
| \fB\-\-constraint="[(rack1|rack2)*1&(rack3)*2]"\fR will get you one node with |
| either the "rack1" or "rack2" features and two nodes with the "rack3" feature. |
| If requesting more than one feature and using node counts, the request |
| must have square brackets surrounding it. |
| |
| \fBNOTE\fR: Brackets are only reserved for \fBMultiple Counts\fR and |
| \fBMatching OR\fR syntax. |
| AND operators require a count for each feature inside square brackets |
| (i.e. "[quad*2&hemi*1]"). Slurm will only allow a single set of bracketed |
| constraints per job. |
| |
| \fBNOTE\fR: Square brackets are not supported by the helpers NodeFeatures |
| plugin. Matching OR can be requested without square brackets by using the |
| vertical bar character with at least one changeable feature. |
| .IP |
| |
| .TP |
| \fBParentheses\fR |
| Parentheses can be used to group like node features together. For example, |
| \fB\-\-constraint="[(knl&snc4&flat)*4&haswell*1]"\fR might be used to specify |
| that four nodes with the features "knl", "snc4" and "flat" plus one node with |
| the feature "haswell" are required. |
| Parentheses can also be used to group operations. Without parentheses, node |
| features are parsed strictly from left to right. |
| For example, |
| \fB\-\-constraint="foo&bar|baz"\fR requests nodes with foo and bar, or baz. |
| \fB\-\-constraint="foo|bar&baz"\fR requests nodes with foo and baz, or bar and |
| baz (note how baz was AND'd with everything). |
| \fB\-\-constraint="foo&(bar|baz)"\fR requests nodes with foo and at least |
| one of bar or baz. |
| \fBNOTE\fR: OR within parentheses should not be used with a KNL |
| NodeFeatures plugin but is supported by the helpers NodeFeatures plugin. |
| .RE |
| .IP |
| |
| .TP |
| \fB\-\-container\fR=<\fIpath_to_container\fR> |
| Absolute path to OCI container bundle. |
| .IP |
| |
| .TP |
| \fB\-\-container-id\fR=<\fIcontainer_id\fR> |
| Unique name for OCI container. |
| .IP |
| |
| .TP |
| \fB\-\-contiguous\fR |
| If set, then the allocated nodes must form a contiguous set. |
| |
| \fBNOTE\fR: This option will only work with the \fBtopology/flat\fR plugin. |
| Other topology plugins modify the node ordering and prevent this option from |
| taking effect. |
| .IP |
| |
| .TP |
| \fB\-S\fR, \fB\-\-core\-spec\fR=<\fInum\fR> |
| Count of Specialized Cores per node reserved by the job for system operations |
| and not used by the application. |
| If AllowSpecResourcesUsage is enabled a job can override the CoreSpecCount of |
| all its allocated nodes with this option. |
| The overridden Specialized Cores will still be reserved for system processes. |
| The job will get an implicit \fB--exclusive\fR allocation for the rest of |
| the Cores on the nodes, resulting in the job's processes being able to use (and |
| being charged for) all the Cores on the nodes except for the overridden |
| Specialized Cores. |
| This option can not be used with the \fB\-\-thread\-spec\fR option. |
| |
| \fBNOTE\fR: Explicitly setting a job's specialized core value implicitly sets |
| the --exclusive option. |
| .IP |
| |
| .TP |
| \fB\-\-cores\-per\-socket\fR=<\fIcores\fR> |
| Restrict node selection to nodes with at least the specified number of |
| cores per socket. See additional information under \fB\-B\fR option |
| above when task/affinity plugin is enabled. |
| .br |
| \fBNOTE\fR: This option may implicitly set the number of tasks (if \fB\-n\fR |
| was not specified) as one task per requested thread. |
| .IP |
| |
| .TP |
| \fB\-\-cpu\-freq\fR=<\fIp1\fR>[\-\fIp2\fR][:\fIp3\fR] |
| |
| Request that job steps initiated by srun commands inside this sbatch script |
| be run at some requested frequency if possible, on the CPUs selected |
| for the step on the compute node(s). |
| |
| \fBp1\fR can be [#### | low | medium | high | highm1] which will set the |
| frequency scaling_speed to the corresponding value, and set the frequency |
| scaling_governor to UserSpace. See below for definition of the values. |
| |
| \fBp1\fR can be [Conservative | OnDemand | Performance | PowerSave] which |
| will set the scaling_governor to the corresponding value. The governor has to be |
| in the list set by the slurm.conf option CpuFreqGovernors. |
| |
| When \fBp2\fR is present, \fBp1\fR will be the minimum scaling frequency and |
| \fBp2\fR will be the maximum scaling frequency. In that case the governor |
| \fBp3\fR or CpuFreqDef cannot be UserSpace since it doesn't support a range. |
| |
| \fBp2\fR can be [#### | medium | high | highm1]. p2 must be greater than p1 and |
| is incompatible with UserSpace governor. |
| |
| \fBp3\fR can be [Conservative | OnDemand | Performance | PowerSave | SchedUtil | |
| UserSpace] |
| which will set the governor to the corresponding value. |
| |
| If \fBp3\fR is UserSpace, the frequency scaling_speed, scaling_max_freq and |
| scaling_min_freq will be statically set to the value defined by \fBp1\fR. |
| |
| Any requested frequency below the minimum available frequency will be rounded |
| to the minimum available frequency. In the same way, any requested frequency |
| above the maximum available frequency will be rounded to the maximum available |
| frequency. |
| |
| The \fBCpuFreqDef\fR parameter in slurm.conf will be used to set the governor |
| in absence of \fBp3\fR. If there's no \fBCpuFreqDef\fR, the default governor |
| will be to use the system current governor set in each cpu. Specifying a |
| range without \fBCpuFreqDef\fR or a specific governor is therefore not allowed. |
| |
| Acceptable values at present include: |
| .IP |
| .RS |
| .TP 14 |
| \fB####\fR |
| frequency in kilohertz |
| .IP |
| |
| .TP |
| \fBLow\fR |
| the lowest available frequency |
| .IP |
| |
| .TP |
| \fBHigh\fR |
| the highest available frequency |
| .IP |
| |
| .TP |
| \fBHighM1\fR |
| (high minus one) will select the next highest available frequency |
| .IP |
| |
| .TP |
| \fBMedium\fR |
| attempts to set a frequency in the middle of the available range |
| .IP |
| |
| .TP |
| \fBConservative\fR |
| attempts to use the Conservative CPU governor |
| .IP |
| |
| .TP |
| \fBOnDemand\fR |
| attempts to use the OnDemand CPU governor (the default value) |
| .IP |
| |
| .TP |
| \fBPerformance\fR |
| attempts to use the Performance CPU governor |
| .IP |
| |
| .TP |
| \fBPowerSave\fR |
| attempts to use the PowerSave CPU governor |
| .IP |
| |
| .TP |
| \fBUserSpace\fR |
| attempts to use the UserSpace CPU governor |
| .IP |
| .RE |
| |
| The following informational environment variable is set in the job |
| step when \fB\-\-cpu\-freq\fR option is requested. |
| .nf |
| SLURM_CPU_FREQ_REQ |
| .fi |
| |
| This environment variable can also be used to supply the value for the |
| CPU frequency request if it is set when the 'srun' command is issued. |
| The \fB\-\-cpu\-freq\fR on the command line will override the |
| environment variable value. The form on the environment variable is |
| the same as the command line. |
| See the \fBENVIRONMENT VARIABLES\fR |
| section for a description of the SLURM_CPU_FREQ_REQ variable. |
| |
| \fBNOTE\fR: This parameter is treated as a request, not a requirement. |
| If the job step's node does not support setting the CPU frequency, or |
| the requested value is outside the bounds of the legal frequencies, an |
| error is logged, but the job step is allowed to continue. |
| |
| \fBNOTE\fR: Setting the frequency for just the CPUs of the job step |
| implies that the tasks are confined to those CPUs. If task |
| confinement (i.e. the task/affinity TaskPlugin is enabled, or the task/cgroup |
| TaskPlugin is enabled with "ConstrainCores=yes" set in cgroup.conf) is not |
| configured, this parameter is ignored. |
| |
| \fBNOTE\fR: When the step completes, the frequency and governor of each |
| selected CPU is reset to the previous values. |
| |
| \fBNOTE\fR: When submitting jobs with the \fB\-\-cpu\-freq\fR option |
| with linuxproc as the ProctrackType can cause jobs to run too quickly before |
| Accounting is able to poll for job information. As a result not all of |
| accounting information will be present. |
| .RE |
| .IP |
| |
| .TP |
| \fB\-\-cpus\-per\-gpu\fR=<\fIncpus\fR> |
| Request that \fIncpus\fR processors be allocated per allocated GPU. |
| Steps inheriting this value will imply \-\-exact. |
| Not compatible with the \fB\-\-cpus\-per\-task\fR option. |
| .IP |
| |
| .TP |
| \fB\-c\fR, \fB\-\-cpus\-per\-task\fR=<\fIncpus\fR> |
| Advise the Slurm controller that ensuing job steps will require \fIncpus\fR |
| number of processors per task. Without this option, the controller will |
| just try to allocate one processor per task. |
| |
| For instance, |
| consider an application that has 4 tasks, each requiring 3 processors. If our |
| cluster is comprised of quad\-processors nodes and we simply ask for |
| 12 processors, the controller might give us only 3 nodes. However, by using |
| the \-\-cpus\-per\-task=3 options, the controller knows that each task requires |
| 3 processors on the same node, and the controller will grant an allocation |
| of 4 nodes, one for each of the 4 tasks. |
| |
| .TP |
| \fB\-\-deadline\fR=<\fIOPT\fR> |
| Remove the job if no ending is possible before |
| this deadline (start > (deadline \- time[\-min])). |
| Default is no deadline. Note that if neither \fBDefaultTime\fR nor |
| \fBMaxTime\fR are configured on the partition the job is in, the job will |
| need to specify some form of time limit (\-\-time[\-min]) if a deadline |
| is to be used. |
| |
| Valid time formats are: |
| .br |
| HH:MM[:SS] [AM|PM] |
| .br |
| MMDD[YY] or MM/DD[/YY] or MM.DD[.YY] |
| .br |
| MM/DD[/YY]\-HH:MM[:SS] |
| .br |
| YYYY\-MM\-DD[THH:MM[:SS]]] |
| .br |
| now[+\fIcount\fR[seconds(default)|minutes|hours|days|weeks]] |
| .IP |
| |
| .TP |
| \fB\-\-delay\-boot\fR=<\fIminutes\fR> |
| Do not reboot nodes in order to satisfied this job's feature specification if |
| the job has been eligible to run for less than this time period. |
| If the job has waited for less than the specified period, it will use only |
| nodes which already have the specified features. |
| The argument is in units of minutes. |
| A default value may be set by a system administrator using the \fBdelay_boot\fR |
| option of the \fBSchedulerParameters\fR configuration parameter in the |
| slurm.conf file, otherwise the default value is zero (no delay). |
| .IP |
| |
| .TP |
| \fB\-d\fR, \fB\-\-dependency\fR=<\fIdependency_list\fR> |
| Defer the start of this job until the specified dependencies have been |
| satisfied. Once a dependency is satisfied, it is removed from the job. |
| <\fIdependency_list\fR> is of the form |
| <\fItype:job_id[:job_id][,type:job_id[:job_id]]\fR> or |
| <\fItype:job_id[:job_id][?type:job_id[:job_id]]\fR>. |
| All dependencies must be satisfied if the "," separator is used. |
| Any dependency may be satisfied if the "?" separator is used. |
| Only one separator may be used. For instance: |
| .nf |
| -d afterok:20:21,afterany:23 |
| .fi |
| .IP |
| means that the job can run only after a 0 return code of jobs 20 and 21 |
| AND the completion of job 23. However: |
| .nf |
| -d afterok:20:21?afterany:23 |
| .fi |
| means that any of the conditions (afterok:20 OR afterok:21 OR afterany:23) |
| will be enough to release the job. |
| Many jobs can share the same dependency and these jobs may even belong to |
| different users. The value may be changed after job submission using the |
| scontrol command. |
| Dependencies on remote jobs are allowed in a federation. |
| Once a job dependency fails due to the termination state of a preceding job, |
| the dependent job will never be run, even if the preceding job is requeued and |
| has a different termination state in a subsequent execution. |
| .IP |
| .PD |
| .RS |
| .TP |
| \fBafter:job_id[[+time][:jobid[+time]...]]\fR |
| After the specified jobs start or are cancelled and 'time' in minutes from job |
| start or cancellation happens, this |
| job can begin execution. If no 'time' is given then there is no delay after |
| start or cancellation. |
| .IP |
| |
| .TP |
| \fBafterany:job_id[:jobid...]\fR |
| This job can begin execution after the specified jobs have terminated. |
| This is the default dependency type. |
| .IP |
| |
| .TP |
| \fBafterburstbuffer:job_id[:jobid...]\fR |
| This job can begin execution after the specified jobs have terminated and |
| any associated burst buffer stage out operations have completed. |
| .IP |
| |
| .TP |
| \fBaftercorr:job_id[:jobid...]\fR |
| A task of this job array can begin execution after the corresponding task ID |
| in the specified job has completed successfully (ran to completion with an |
| exit code of zero). |
| .IP |
| |
| .TP |
| \fBafternotok:job_id[:jobid...]\fR |
| This job can begin execution after the specified jobs have terminated |
| in some failed state (non\-zero exit code, node failure, timed out, etc). |
| This job must be submitted while the specified job is still active or within |
| \fBMinJobAge\fR seconds after the specified job has ended. |
| .IP |
| |
| .TP |
| \fBafterok:job_id[:jobid...]\fR |
| This job can begin execution after the specified jobs have successfully |
| executed (ran to completion with an exit code of zero). |
| This job must be submitted while the specified job is still active or within |
| \fBMinJobAge\fR seconds after the specified job has ended. |
| .IP |
| |
| .TP |
| \fBsingleton\fR |
| This job can begin execution after any previously launched jobs |
| sharing the same job name and user have terminated. |
| In other words, only one job by that name and owned by that user can be running |
| or suspended at any point in time. |
| In a federation, a singleton dependency must be fulfilled on all clusters |
| unless DependencyParameters=disable_remote_singleton is used in slurm.conf. |
| .RE |
| .IP |
| |
| .TP |
| \fB\-m\fR, \fB\-\-distribution\fR={*|block|cyclic|arbitrary|plane=<\fIsize\fR>}[:{*|block|cyclic|fcyclic}[:{*|block|cyclic|fcyclic}]][,{Pack|NoPack}] |
| |
| Specify alternate distribution methods for remote processes. |
| For job allocation, this sets environment variables that will be used by |
| subsequent srun requests and also affects which cores will be selected for |
| job allocation. |
| |
| This option controls the distribution of tasks to the nodes on which |
| resources have been allocated, and the distribution of those resources |
| to tasks for binding (task affinity). The first distribution |
| method (before the first ":") controls the distribution of tasks to nodes. |
| The second distribution method (after the first ":") |
| controls the distribution of allocated CPUs across sockets for binding |
| to tasks. The third distribution method (after the second ":") controls |
| the distribution of allocated CPUs across cores for binding to tasks. |
| The second and third distributions apply only if task affinity is enabled. |
| The third distribution is supported only if the task/cgroup plugin is |
| configured. The default value for each distribution type is specified by *. |
| |
| Note that with select/cons_tres, the number of CPUs |
| allocated to each socket and node may be different. Refer to |
| https://slurm.schedmd.com/mc_support.html |
| for more information on resource allocation, distribution of tasks to |
| nodes, and binding of tasks to CPUs. |
| .RS |
| First distribution method (distribution of tasks across nodes): |
| |
| .TP |
| .B * |
| Use the default method for distributing tasks to nodes (block). |
| .IP |
| |
| .TP |
| .B block |
| The block distribution method will distribute tasks to a node such |
| that consecutive tasks share a node. For example, consider an |
| allocation of three nodes each with two cpus. A four\-task block |
| distribution request will distribute those tasks to the nodes with |
| tasks one and two on the first node, task three on the second node, |
| and task four on the third node. Block distribution is the default |
| behavior if the number of tasks exceeds the number of allocated nodes. |
| .IP |
| |
| .TP |
| .B cyclic |
| The cyclic distribution method will distribute tasks to a node such |
| that consecutive tasks are distributed over consecutive nodes (in a |
| round\-robin fashion). For example, consider an allocation of three |
| nodes each with two cpus. A four\-task cyclic distribution request |
| will distribute those tasks to the nodes with tasks one and four on |
| the first node, task two on the second node, and task three on the |
| third node. |
| Note that when SelectType is select/cons_tres, the same number of CPUs |
| may not be allocated on each node. Task distribution will be |
| round\-robin among all the nodes with CPUs yet to be assigned to tasks. |
| Cyclic distribution is the default behavior if the number |
| of tasks is no larger than the number of allocated nodes. |
| .IP |
| |
| .TP |
| .B plane |
| The tasks are distributed in blocks of size <\fIsize\fR>. The size must be given |
| or SLURM_DIST_PLANESIZE must be set. The number of tasks |
| distributed to each node is the same as for cyclic distribution, but the |
| taskids assigned to each node depend on the plane size. Additional distribution |
| specifications cannot be combined with this option. |
| For more details (including examples and diagrams), please see |
| https://slurm.schedmd.com/mc_support.html and |
| https://slurm.schedmd.com/dist_plane.html |
| .IP |
| |
| .TP |
| .B arbitrary |
| The arbitrary method of distribution will allocate processes in\-order |
| as listed in file designated by the environment variable |
| SLURM_HOSTFILE. If this variable is listed it will override any |
| other method specified. If not set the method will default to block. |
| Inside the hostfile must contain at minimum the number of hosts |
| requested and be one per line or comma separated. If specifying a |
| task count (\fB\-n\fR, \fB\-\-ntasks\fR=<\fInumber\fR>), your tasks |
| will be laid out on the nodes in the order of the file. |
| .br |
| \fBNOTE\fR: The arbitrary distribution option on a job allocation only |
| controls the nodes to be allocated to the job and not the allocation of |
| CPUs on those nodes. This option is meant primarily to control a job step's |
| task layout in an existing job allocation for the srun command. |
| .br |
| \fBNOTE\fR: If the number of tasks is given and a list of requested nodes is |
| also given, the number of nodes used from that list will be reduced to match |
| that of the number of tasks if the number of nodes in the list is greater than |
| the number of tasks. |
| .IP |
| |
| .LP |
| Second distribution method (distribution of CPUs across sockets for binding): |
| |
| .TP |
| .B * |
| Use the default method for distributing CPUs across sockets (cyclic). |
| .IP |
| |
| .TP |
| .B block |
| The block distribution method will distribute allocated CPUs |
| consecutively from the same socket for binding to tasks, before using |
| the next consecutive socket. |
| .IP |
| |
| .TP |
| .B cyclic |
| The cyclic distribution method will distribute allocated CPUs for |
| binding to a given task consecutively from the same socket, and |
| from the next consecutive socket for the next task, in a |
| round\-robin fashion across sockets. |
| Tasks requiring more than one CPU will have all of those CPUs allocated on a |
| single socket if possible. |
| .br |
| \fBNOTE\fR: In nodes with hyper-threading enabled, a task not requesting full |
| cores may be distributed across sockets. This can be avoided by specifying |
| \fB\-\-ntasks\-per\-core=1\fR, which forces tasks to allocate full cores. |
| .IP |
| |
| .TP |
| .B fcyclic |
| The fcyclic distribution method will distribute allocated CPUs |
| for binding to tasks from consecutive sockets in a |
| round\-robin fashion across the sockets. |
| Tasks requiring more than one CPU will have each CPUs allocated in a cyclic |
| fashion across sockets. |
| .IP |
| |
| .LP |
| Third distribution method (distribution of CPUs across cores for binding): |
| |
| .TP |
| .B * |
| Use the default method for distributing CPUs across cores |
| (inherited from second distribution method). |
| .IP |
| |
| .TP |
| .B block |
| The block distribution method will distribute allocated CPUs |
| consecutively from the same core for binding to tasks, before using |
| the next consecutive core. |
| .IP |
| |
| .TP |
| .B cyclic |
| The cyclic distribution method will distribute allocated CPUs for |
| binding to a given task consecutively from the same core, and |
| from the next consecutive core for the next task, in a |
| round\-robin fashion across cores. |
| .IP |
| |
| .TP |
| .B fcyclic |
| The fcyclic distribution method will distribute allocated CPUs |
| for binding to tasks from consecutive cores in a |
| round\-robin fashion across the cores. |
| .IP |
| |
| .LP |
| Optional control for task distribution over nodes: |
| |
| .TP |
| .B Pack |
| Rather than evenly distributing a job step's tasks evenly across its allocated |
| nodes, pack them as tightly as possible on the nodes. |
| This only applies when the "block" task distribution method is used. |
| .IP |
| |
| .TP |
| .B NoPack |
| Rather than packing a job step's tasks as tightly as possible on the nodes, |
| distribute them evenly. |
| This user option will supersede the SelectTypeParameters CR_Pack_Nodes |
| configuration parameter. |
| .RE |
| .IP |
| |
| .TP |
| \fB\-e\fR, \fB\-\-error\fR=<\fIfilename_pattern\fR> |
| Instruct Slurm to connect the batch script's standard error directly to the |
| file name specified in the "\fIfilename pattern\fR". |
| By default both standard output and standard error are directed to the same file. |
| For job arrays, the default file name is "slurm\-%A_%a.out", "%A" is replaced |
| by the job ID and "%a" with the array index. |
| For other jobs, the default file name is "slurm\-%j.out", where the "%j" is |
| replaced by the job ID. |
| See the \fBfilename pattern\fR section below for filename specification options. |
| .IP |
| |
| .TP |
| \fB\-x\fR, \fB\-\-exclude\fR=<\fInode_name_list\fR> |
| Explicitly exclude certain nodes from the resources granted to the job. |
| .IP |
| |
| .TP |
| \fB\-\-exclusive\fR[={user|mcs|topo}] |
| The job allocation can not share nodes (or topology segment with the "=topo") |
| with other running jobs (or just other users with the "=user" option or |
| with the "=mcs" option). |
| If user/mcs/topo are not specified (i.e. the job allocation can not share nodes with |
| other running jobs), the job is allocated all CPUs and GRES on all nodes in the |
| allocation, but is only allocated as much memory as it requested. This is by |
| design to support gang scheduling, because suspended jobs still reside in |
| memory. To request all the memory on a node, use \fB\-\-mem=0\fR. |
| The default shared/exclusive behavior depends on system configuration and the |
| partition's \fBOverSubscribe\fR option takes precedence over the job's option. |
| \fBNOTE\fR: Since shared GRES (MPS) cannot be allocated at the same time as a |
| sharing GRES (GPU) this option only allocates all sharing GRES and no underlying |
| shared GRES. |
| |
| \fBNOTE\fR: This option is mutually exclusive with \fB\-\-oversubscribe\fR. |
| .IP |
| |
| .TP |
| \fB\-\-export\fR={[ALL,]<\fIenvironment_variables\fR>|ALL|NIL|NONE} |
| Identify which environment variables from the submission environment are |
| propagated to the launched application. Note that SLURM_* variables are |
| always propagated. |
| .IP |
| .RS |
| .TP 10 |
| \fB\-\-export\fR=ALL |
| Default mode if \fB\-\-export\fR is not specified. All of the user's environment |
| will be loaded (either from the caller's environment or from a clean environment |
| if \fI\-\-get\-user\-env\fR is specified). |
| .IP |
| |
| .TP |
| \fB\-\-export\fR=NIL |
| Only SLURM_* and SPANK option variables from the user environment will be |
| defined. User must use absolute path to the binary to be executed that will |
| define the environment. |
| User can not specify explicit environment variables with "NIL". |
| |
| Unlike NONE, NIL will not automatically create a user's environment using the |
| \fI\-\-get\-user\-env\fR mechanism. |
| .IP |
| |
| .TP |
| \fB\-\-export\fR=NONE |
| Only SLURM_* and SPANK option variables from the user environment will be |
| defined. User must use absolute path to the binary to be executed that will |
| define the environment. |
| User can not specify explicit environment variables with "NONE". |
| However, Slurm will then implicitly attempt to load the user's environment on |
| the node where the script is being executed, as if \fI\-\-get\-user\-env\fR was |
| specified. |
| |
| This option is particularly important for jobs that are submitted on one cluster |
| and execute on a different cluster (e.g. with different paths). |
| To avoid steps inheriting environment export settings (e.g. "NONE") from |
| sbatch command, the environment variable SLURM_EXPORT_ENV should be set to |
| "ALL" in the job script. |
| .IP |
| |
| .TP |
| \fB\-\-export\fR=[\fIALL\fR,]<\fIenvironment_variables\fR> |
| Exports all SLURM_* and SPANK option environment variables along with explicitly |
| defined variables. Multiple environment variable names should be comma |
| separated. |
| Environment variable names may be specified to propagate the current |
| value (e.g. "\-\-export=EDITOR") or specific values may be exported |
| (e.g. "\-\-export=EDITOR=/bin/emacs"). If "ALL" is specified, then all user |
| environment variables will be loaded and will take precedence over any |
| explicitly given environment variables. |
| .IP |
| .RS 5 |
| .TP 5 |
| Example: \fB\-\-export\fR=EDITOR,ARG1=test |
| In this example, the propagated environment will only contain the |
| variable \fIEDITOR\fR from the user's environment, \fISLURM_*\fR environment |
| variables, and \fIARG1\fR=test. |
| .IP |
| |
| .TP |
| Example: \fB\-\-export\fR=ALL,EDITOR=/bin/emacs |
| There are two possible outcomes for this example. If the caller has the |
| \fIEDITOR\fR environment variable defined, then the job's environment will |
| inherit the variable from the caller's environment. If the caller doesn't |
| have an environment variable defined for \fIEDITOR\fR, then the job's |
| environment will use the value given by \fB\-\-export\fR. |
| .RE |
| |
| \fBNOTE\fR: NONE and [\fIALL\fR,]<\fIenvironment_variables\fR> implicitly |
| work as if \fB--get-user-env\fR was defined. Please see the implications |
| of this in its respective section. |
| |
| .RE |
| .IP |
| |
| .TP |
| \fB\-\-export\-file\fR={<\fIfilename\fR>|<\fIfd\fR>} |
| If a number between 3 and OPEN_MAX is specified as the argument to |
| this option, a readable file descriptor will be assumed (STDIN and |
| STDOUT are not supported as valid arguments). Otherwise a filename is |
| assumed. Export environment variables defined in <\fIfilename\fR> or |
| read from <\fIfd\fR> to the job's execution environment. The |
| content is one or more environment variable definitions of the form |
| NAME=value, each separated by a null character. This allows the use |
| of special characters in environment definitions. |
| .IP |
| |
| .TP |
| \fB\-\-extra\fR=<\fIstring\fR> |
| An arbitrary string enclosed in single or double quotes if using spaces or some |
| special characters. |
| |
| If \fBSchedulerParameters=extra_constraints\fR is enabled, this string is used |
| for node filtering based on the \fIExtra\fR field in each node. |
| .IP |
| |
| .TP |
| \fB\-B\fR, \fB\-\-extra\-node\-info\fR=<\fIsockets\fR>[:\fIcores\fR[:\fIthreads\fR]] |
| Restrict node selection to nodes with at least the specified number of |
| sockets, cores per socket and/or threads per core. |
| .br |
| \fBNOTE\fR: These options do not specify the resource allocation size. |
| Each value specified is considered a minimum. |
| An asterisk (*) can be used as a placeholder indicating that all available |
| resources of that type are to be utilized. Values can also be specified as |
| min\-max. The individual levels can also be specified in separate options if |
| desired: |
| .nf |
| \fB\-\-sockets\-per\-node\fR=<\fIsockets\fR> |
| \fB\-\-cores\-per\-socket\fR=<\fIcores\fR> |
| \fB\-\-threads\-per\-core\fR=<\fIthreads\fR> |
| .fi |
| If task/affinity plugin is enabled, then specifying an allocation in this |
| manner also results in subsequently launched tasks being bound to threads |
| if the \fB\-B\fR option specifies a thread count, otherwise an option of |
| \fIcores\fR if a core count is specified, otherwise an option of \fIsockets\fR. |
| If SelectType is configured to select/cons_tres, it must have a parameter of |
| CR_Core, CR_Core_Memory, CR_Socket, or CR_Socket_Memory for this option |
| to be honored. |
| If not specified, the scontrol show job will display 'ReqS:C:T=*:*:*'. This |
| option applies to job allocations. |
| .br |
| \fBNOTE\fR: This option is mutually exclusive with \fB\-\-hint\fR, |
| \fB\-\-threads\-per\-core\fR and \fB\-\-ntasks\-per\-core\fR. |
| .br |
| \fBNOTE\fR: This option may implicitly set the number of tasks (if \fB\-n\fR |
| was not specified) as one task per requested thread. |
| .IP |
| |
| .TP |
| \fB\-\-get\-user\-env\fR |
| This option will tell sbatch to retrieve the |
| login environment variables for the user specified in the \fB\-\-uid\fR option. |
| The environment variables are retrieved by running something of this sort |
| "su \- <username> \-c /usr/bin/env" and parsing the output. |
| Be aware that any environment variables already set in sbatch's environment |
| will take precedence over any environment variables in the user's |
| login environment. Clear any environment variables before calling sbatch |
| that you do not want propagated to the spawned program. If the user environment |
| retrieval fails or times out, the job will be aborted, requeued and held. |
| |
| \fBNOTE\fR: The explicit or implicit use of \fB--get-user-env\fR relies in |
| the capability of being able to create PID and mount namespaces. It is very |
| advisable to ensure that PID and mount namespace creation is available and |
| not limited (check that \fB/proc/sys/user/max_[pid|mnt]_namespaces\fR |
| is not 0). Although they are not strictly mandatory for \fB--get-user-env\fR |
| to work, they ensure that there are no orphan processes left after the |
| environment is retrieved. |
| .IP |
| |
| .TP |
| \fB\-\-gid\fR=<\fIgroup\fR> |
| If \fBsbatch\fR is run as root, and the \fB\-\-gid\fR option is used, |
| submit the job with \fIgroup\fR's group access permissions. \fIgroup\fR |
| may be the group name or the numerical group ID. |
| .IP |
| |
| .TP |
| \fB\-\-gpu\-bind\fR=[verbose,]<\fItype\fR> |
| Equivalent to \-\-tres\-bind=gres/gpu:[verbose,]<\fItype\fR> |
| See \fB\-\-tres\-bind\fR for all options and documentation. |
| .IP |
| |
| .TP |
| \fB\-\-gpu\-freq\fR=[<\fItype\fR]=\fIvalue\fR>[,<\fItype\fR=\fIvalue\fR>][,verbose] |
| Request that GPUs allocated to the job are configured with specific frequency |
| values. |
| This option can be used to independently configure the GPU and its memory |
| frequencies. |
| After the job is completed, the frequencies of all affected GPUs will be reset |
| to the highest possible values. |
| In some cases, system power caps may override the requested values. |
| The field \fItype\fR can be "memory". |
| If \fItype\fR is not specified, the GPU frequency is implied. |
| The \fIvalue\fR field can either be "low", "medium", "high", "highm1" or |
| a numeric value in megahertz (MHz). |
| If the specified numeric value is not possible, a value as close as |
| possible will be used. See below for definition of the values. |
| The \fIverbose\fR option causes current GPU frequency information to be logged. |
| Examples of use include "\-\-gpu\-freq=medium,memory=high" and |
| "\-\-gpu\-freq=450". |
| |
| Supported \fIvalue\fR definitions: |
| .IP |
| .RS |
| .TP 10 |
| \fBlow\fR |
| the lowest available frequency. |
| .IP |
| |
| .TP |
| \fBmedium\fR |
| attempts to set a frequency in the middle of the available range. |
| .IP |
| |
| .TP |
| \fBhigh\fR |
| the highest available frequency. |
| .IP |
| |
| .TP |
| \fBhighm1\fR |
| (high minus one) will select the next highest available frequency. |
| .RE |
| .IP |
| |
| .TP |
| \fB\-G\fR, \fB\-\-gpus\fR=[\fItype\fR:]<\fInumber\fR> |
| Specify the total number of GPUs required for the job. |
| An optional GPU type specification can be supplied. |
| For example "\-\-gpus=volta:3". |
| See also the \fB\-\-gpus\-per\-node\fR, \fB\-\-gpus\-per\-socket\fR and |
| \fB\-\-gpus\-per\-task\fR options. |
| .br |
| \fBNOTE\fR: The allocation has to contain at least one GPU per node, or one of |
| each GPU type per node if types are used. Use heterogeneous jobs if different |
| nodes need different GPU types. |
| .IP |
| |
| .TP |
| \fB\-\-gpus\-per\-node\fR=[\fItype\fR:]<\fInumber\fR> |
| Specify the number of GPUs required for the job on each node included in |
| the job's resource allocation. |
| An optional GPU type specification can be supplied. |
| For example "\-\-gpus\-per\-node=volta:3". |
| Multiple options can be requested in a comma separated list, for example: |
| "\-\-gpus\-per\-node=volta:3,kepler:1". |
| See also the \fB\-\-gpus\fR, \fB\-\-gpus\-per\-socket\fR and |
| \fB\-\-gpus\-per\-task\fR options. |
| .IP |
| |
| .TP |
| \fB\-\-gpus\-per\-socket\fR=[\fItype\fR:]<\fInumber\fR> |
| Specify the number of GPUs required for the job on each socket included in |
| the job's resource allocation. |
| An optional GPU type specification can be supplied. |
| For example "\-\-gpus\-per\-socket=volta:3". |
| Multiple options can be requested in a comma separated list, for example: |
| "\-\-gpus\-per\-socket=volta:3,kepler:1". |
| Requires job to specify a sockets per node count ( \-\-sockets\-per\-node). |
| See also the \fB\-\-gpus\fR, \fB\-\-gpus\-per\-node\fR and |
| \fB\-\-gpus\-per\-task\fR options. |
| .IP |
| |
| .TP |
| \fB\-\-gpus\-per\-task\fR=[\fItype\fR:]<\fInumber\fR> |
| Specify the number of GPUs required for the job on each task to be spawned |
| in the job's resource allocation. |
| An optional GPU type specification can be supplied. |
| For example "\-\-gpus\-per\-task=volta:1". Multiple options can be |
| requested in a comma separated list, for example: |
| "\-\-gpus\-per\-task=volta:3,kepler:1". See also the \fB\-\-gpus\fR, |
| \fB\-\-gpus\-per\-socket\fR and \fB\-\-gpus\-per\-node\fR options. |
| This option requires an explicit task count, e.g. \-n, \-\-ntasks or "\-\-gpus=X |
| \-\-gpus\-per\-task=Y" rather than an ambiguous range of nodes with \-N, \-\-nodes. |
| This option will implicitly set \-\-tres\-bind=gres/gpu:per_task:<gpus_per_task>, |
| or if multiple gpu types are specified |
| \-\-tres\-bind=gres/gpu:per_task:<gpus_per_task_type_sum>. However, that can be |
| overridden with an explicit \-\-tres\-bind=gres/gpu specification. |
| .br |
| .IP |
| |
| .TP |
| \fB\-\-gres\fR=<\fIlist\fR> |
| Specifies a comma\-delimited list of generic consumable resources requested per |
| node. |
| The format for each entry in the list is "name[[:type]:count]". |
| The \fIname\fR is the type of consumable resource (e.g. gpu). |
| The \fItype\fR is an optional classification for the resource (e.g. a100). |
| The \fIcount\fR is the number of those resources with a default value of 1. |
| The count can have a suffix of |
| "k" or "K" (multiple of 1024), |
| "m" or "M" (multiple of 1024 x 1024), |
| "g" or "G" (multiple of 1024 x 1024 x 1024), |
| "t" or "T" (multiple of 1024 x 1024 x 1024 x 1024), |
| "p" or "P" (multiple of 1024 x 1024 x 1024 x 1024 x 1024). |
| The specified resources will be allocated to the job on each node. |
| The available generic consumable resources is configurable by the system |
| administrator. |
| A list of available generic consumable resources will be printed and the |
| command will exit if the option argument is "help". |
| Examples of use include "\-\-gres=gpu:2", "\-\-gres=gpu:kepler:2", and |
| "\-\-gres=help". |
| .IP |
| |
| .TP |
| \fB\-\-gres\-flags\fR=<\fItype\fR> |
| Specify generic resource task binding options. |
| .IP |
| .RS |
| |
| .TP |
| .B multiple\-tasks\-per\-sharing |
| Negate \fBone\-task\-per\-sharing\fR. This is useful if it is set by default in |
| \fBSelectTypeParameters\fR. |
| .IP |
| |
| .TP |
| .B disable\-binding |
| Negate \fBenforce\-binding\fR. This is useful if it is set by default in |
| \fBSelectTypeParameters\fR. |
| .IP |
| |
| .TP |
| .B enforce\-binding |
| The only CPUs available to the job will be those bound to the selected |
| GRES (i.e. the CPUs identified in the gres.conf file will be strictly |
| enforced). This option may result in delayed initiation of a job. |
| For example a job requiring two GPUs and one CPU will be delayed until both |
| GPUs on a single socket are available rather than using GPUs bound to separate |
| sockets, however, the application performance may be improved due to improved |
| communication speed. |
| Requires the node to be configured with more than one socket and resource |
| filtering will be performed on a per\-socket basis. |
| .br |
| \fBNOTE\fR: This option can be set by default in \fBSelectTypeParameters\fR. |
| .br |
| \fBNOTE\fR: This option is specific to \fBSelectType=cons_tres\fR. |
| .br |
| \fBNOTE\fR: This option can give undefined results if attempting to enforce |
| binding on multiple gres on multiple sockets. |
| .IP |
| |
| .TP |
| .B one\-task\-per\-sharing |
| Do not allow different tasks in to be allocated shared gres from the same |
| sharing gres. |
| .br |
| \fBNOTE\fR: This flag is only enforced if shared gres are requested with |
| \-\-tres\-per\-task. |
| .br |
| \fBNOTE\fR: This option can be set by default with |
| \fBSelectTypeParameters=ONE_TASK_PER_SHARING_GRES\fR. |
| .br |
| \fBNOTE\fR: This option is specific to |
| \fBSelectTypeParameters=MULTIPLE_SHARING_GRES_PJ\fR |
| .RE |
| .IP |
| |
| .TP |
| \fB\-h\fR, \fB\-\-help\fR |
| Display help information and exit. |
| .IP |
| |
| .TP |
| \fB\-\-hint\fR=<\fItype\fR> |
| Bind tasks according to application hints. |
| .br |
| \fBNOTE\fR: This option implies specific values for certain related options, |
| which prevents its use with any user\-specified values for |
| \fB\-\-ntasks\-per\-core\fR, \fB\-\-cores\-per\-socket\fR, |
| \fB\-\-sockets\-per\-node\fR, \fB\-\-threads\-per\-core\fR or \fB\-B\fR. |
| These conflicting options will override \fB\-\-hint\fR when specified as |
| command line arguments. If a conflicting option is specified as an environment |
| variable, \-\-hint as a command line argument will take precedence. |
| .IP |
| .RS |
| .TP |
| .B compute_bound |
| Select settings for compute bound applications: |
| use all cores in each socket, one thread per core. |
| .IP |
| |
| .TP |
| .B memory_bound |
| Select settings for memory bound applications: |
| use only one core in each socket, one thread per core. |
| .IP |
| |
| .TP |
| .B multithread |
| Use extra threads with in\-core multi\-threading |
| which can benefit communication intensive applications. |
| Only supported with the task/affinity plugin. |
| .IP |
| |
| .TP |
| .B nomultithread |
| Don't use extra threads with in\-core multi\-threading; |
| restricts tasks to one thread per core. |
| Only supported with the task/affinity plugin. |
| .IP |
| |
| .TP |
| .B help |
| show this help message |
| .RE |
| .IP |
| |
| .TP |
| \fB\-H, \-\-hold\fR |
| Specify the job is to be submitted in a held state (priority of zero). |
| A held job can now be released using scontrol to reset its priority |
| (e.g. "\fIscontrol release <job_id>\fR"). |
| .IP |
| |
| .TP |
| \fB\-\-ignore\-pbs\fR |
| Ignore all "#PBS" and "#BSUB" options specified in the batch script. |
| .IP |
| |
| .TP |
| \fB\-i\fR, \fB\-\-input\fR=<\fIfilename_pattern\fR> |
| Instruct Slurm to connect the batch script's standard input |
| directly to the file name specified in the "\fIfilename pattern\fR". |
| |
| By default, "/dev/null" is open on the batch script's standard input and both |
| standard output and standard error are directed to a file of the name |
| "slurm\-%j.out", where the "%j" is replaced with the job allocation number, as |
| described below in the \fBfilename pattern\fR section. |
| .IP |
| |
| .TP |
| \fB\-J\fR, \fB\-\-job\-name\fR=<\fIjobname\fR> |
| Specify a name for the job allocation. The specified name will appear along with |
| the job id number when querying running jobs on the system. The default |
| is the name of the batch script, or just "sbatch" if the script is |
| read on sbatch's standard input. |
| .IP |
| |
| .TP |
| \fB\-\-kill\-on\-invalid\-dep\fR=<yes|no> |
| If a job has an invalid dependency and it can never run this parameter tells |
| Slurm to terminate it or not. A terminated job state will be JOB_CANCELLED. |
| If this option is not specified the system wide behavior applies. |
| By default the job stays pending with reason DependencyNeverSatisfied or if the |
| kill_invalid_depend is specified in slurm.conf the job is terminated. |
| .IP |
| |
| .TP |
| \fB\-L\fR, \fB\-\-licenses\fR=<\fIlicense\fR>[@\fIdb\fR][:\fIcount\fR][,\fIlicense\fR[@\fIdb\fR][:\fIcount\fR]...] |
| Specification of licenses (or other resources available on all |
| nodes of the cluster) which must be allocated to this job. |
| License names can be followed by a colon and count |
| (the default count is one). |
| Multiple licenses can be requested. If they are separated by a comma (',' |
| meaning AND), then all requested licenses are required for the job. For example, |
| "\-\-licenses=foo:4,bar". If they are separated by a pipe ('|' meaning OR), |
| then only one of the license requests are required for the job. For example, |
| "\-\-licenses=foo:4|bar". AND and OR cannot both be used. |
| To submit jobs using remote licenses, those served by the slurmdbd, specify |
| the name of the server providing the licenses. |
| For example "\-\-license=nastran@slurmdb:12". |
| |
| \fBNOTE\fR: When submitting heterogeneous jobs, license requests |
| may only be made on the first component job. |
| For example "sbatch \-L ansys:2 : script.sh". |
| |
| \fBNOTE\fR: If licenses are tracked in AccountingStorageTres and OR is used, |
| ReqTRES will display all requested tres separated by commas. AllocTRES will |
| display only the license that was allocated to the job. |
| |
| \fBNOTE\fR: When a job requests OR'd licenses, Slurm will attempt to allocate |
| the licenses in the order in which they are requested. This specified order |
| will take precedence even if the rest of requested licenses could be satisfied |
| on a requested reservation. This also applies to backfill planning when |
| \fBSchedulerParameters=bf_licenses\fR is configured. |
| .IP |
| |
| .TP |
| \fB\-\-mail\-type\fR=<\fItype\fR> |
| Notify user by email when certain event types occur. |
| Valid \fItype\fR values are NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to |
| BEGIN, END, FAIL, INVALID_DEPEND, REQUEUE, and STAGE_OUT), INVALID_DEPEND |
| (dependency never satisfied), STAGE_OUT (burst buffer stage out and teardown |
| completed), TIME_LIMIT, TIME_LIMIT_90 (reached 90 percent of time limit), |
| TIME_LIMIT_80 (reached 80 percent of time limit), TIME_LIMIT_50 (reached 50 |
| percent of time limit) and ARRAY_TASKS (send emails for each array task). |
| Multiple \fItype\fR values may be specified in a comma separated list. |
| NONE will suppress all event notifications, ignoring any other values specified. |
| By default no email notifications are sent. |
| The user to be notified is indicated with \fB\-\-mail\-user\fR. |
| |
| Unless the ARRAY_TASKS option is specified, mail notifications on job BEGIN, |
| END, FAIL and REQUEUE apply to a job array as a whole rather than generating |
| individual email messages for each task in the job array. |
| .IP |
| |
| .TP |
| \fB\-\-mail\-user\fR=<\fIuser\fR> |
| User to receive email notification of state changes as defined by |
| \fB\-\-mail\-type\fR. This may be a full email address or a username. If a |
| username is specified, the value from \fBMailDomain\fR in slurm.conf will be |
| appended to create an email address. |
| The default value is the submitting user. |
| .IP |
| |
| .TP |
| \fB\-\-mcs\-label\fR=<\fImcs\fR> |
| Used only when a compatible \fBMCSPlugin\fR is enabled. This parameter is a |
| group that the user belongs to (\fBmcs/group\fR) or an arbitrary label string |
| (\fBmcs/label\fR). In both cases, no label will be assigned by default. Refer to |
| the MCS documentation for more details: <https://slurm.schedmd.com/mcs.html> |
| .IP |
| |
| .TP |
| \fB\-\-mem\fR=<\fIsize\fR>[\fIunits\fR] |
| Specify the real memory required per node. |
| Default units are megabytes. |
| Different units can be specified using the suffix [K|M|G|T]. |
| Default value is \fBDefMemPerNode\fR and the maximum value is |
| \fBMaxMemPerNode\fR. If configured, both parameters can be |
| seen using the \fBscontrol show config\fR command. |
| This parameter would generally be used if whole nodes |
| are allocated to jobs (\fBSelectType=select/linear\fR). |
| Also see \fB\-\-mem\-per\-cpu\fR and \fB\-\-mem\-per\-gpu\fR. |
| The \fB\-\-mem\fR, \fB\-\-mem\-per\-cpu\fR and \fB\-\-mem\-per\-gpu\fR |
| options are mutually exclusive. If \fB\-\-mem\fR, \fB\-\-mem\-per\-cpu\fR or |
| \fB\-\-mem\-per\-gpu\fR are specified as command line arguments, then they will |
| take precedence over the environment. |
| |
| \fBNOTE\fR: A memory size specification of zero is treated as a special case and |
| grants the job access to all of the memory on each node. |
| |
| \fBNOTE\fR: The memory used by each slurmstepd process is included in the job's |
| total memory usage. It typically consumes between 20MiB and 200MiB, though this |
| can vary depending on system configuration and any loaded plugins. |
| |
| \fBNOTE\fR: Memory requests will not be strictly enforced unless Slurm is |
| configured to use an enforcement mechanism. See \fBConstrainRAMSpace\fR in |
| the \fBcgroup.conf\fR(5) man page and \fBOverMemoryKill\fR in the |
| \fBslurm.conf\fR(5) man page for more details. |
| .IP |
| |
| .TP |
| \fB\-\-mem\-bind\fR=[{quiet|verbose},]<\fItype\fR> |
| Bind tasks to memory. Used only when the task/affinity plugin is enabled |
| and the NUMA memory functions are available. |
| \fBNote that the resolution of CPU and memory binding |
| may differ on some architectures.\fR For example, CPU binding may be performed |
| at the level of the cores within a processor while memory binding will |
| be performed at the level of nodes, where the definition of "nodes" |
| may differ from system to system. |
| By default no memory binding is performed; any task using any CPU can use |
| any memory. This option is typically used to ensure that each task is bound to |
| the memory closest to its assigned CPU. \fBThe use of any type other than |
| "none" or "local" is not recommended.\fR |
| |
| \fBNOTE\fR: To have Slurm always report on the selected memory binding for |
| all commands executed in a shell, you can enable verbose mode by |
| setting the SLURM_MEM_BIND environment variable value to "verbose". |
| |
| The following informational environment variables are set when |
| \fB\-\-mem\-bind\fR is in use: |
| .IP |
| .nf |
| SLURM_MEM_BIND_LIST |
| SLURM_MEM_BIND_PREFER |
| SLURM_MEM_BIND_SORT |
| SLURM_MEM_BIND_TYPE |
| SLURM_MEM_BIND_VERBOSE |
| .fi |
| |
| See the \fBENVIRONMENT VARIABLES\fR section for a more detailed description |
| of the individual SLURM_MEM_BIND* variables. |
| |
| Supported options include: |
| .IP |
| .RS |
| .TP |
| .B help |
| show this help message |
| .IP |
| |
| .TP |
| .B local |
| Use memory local to the processor in use |
| .IP |
| |
| .TP |
| .B map_mem:<list> |
| Bind by setting memory masks on tasks (or ranks) as specified where <list> is |
| <numa_id_for_task_0>,<numa_id_for_task_1>,... |
| The mapping is specified for a node and identical mapping is applied to the |
| tasks on every node (i.e. the lowest task ID on each node is mapped to the |
| first ID specified in the list, etc.). |
| NUMA IDs are interpreted as decimal values unless they are preceded |
| with '0x' in which case they interpreted as hexadecimal values. |
| If the number of tasks (or ranks) exceeds the number of elements in this list, |
| elements in the list will be reused as needed starting from the beginning of |
| the list. |
| To simplify support for large task counts, the lists may follow a map with an |
| asterisk and repetition count. |
| For example "map_mem:0x0f*4,0xf0*4". |
| For predictable binding results, all CPUs for each node in the job should be |
| allocated to the job. |
| .IP |
| |
| .TP |
| .B mask_mem:<list> |
| Bind by setting memory masks on tasks (or ranks) as specified where <list> is |
| <numa_mask_for_task_0>,<numa_mask_for_task_1>,... |
| The mapping is specified for a node and identical mapping is applied to the |
| tasks on every node (i.e. the lowest task ID on each node is mapped to the |
| first mask specified in the list, etc.). |
| NUMA masks are \fBalways\fR interpreted as hexadecimal values. |
| Note that masks must be preceded with a '0x' if they don't begin |
| with [0\-9] so they are seen as numerical values. |
| If the number of tasks (or ranks) exceeds the number of elements in this list, |
| elements in the list will be reused as needed starting from the beginning of |
| the list. |
| To simplify support for large task counts, the lists may follow a mask with an |
| asterisk and repetition count. |
| For example "mask_mem:0*4,1*4". |
| For predictable binding results, all CPUs for each node in the job should be |
| allocated to the job. |
| .IP |
| |
| .TP |
| .B no[ne] |
| don't bind tasks to memory (default) |
| .IP |
| |
| .TP |
| .B p[refer] |
| Prefer use of first specified NUMA node, but permit |
| use of other available NUMA nodes. |
| .IP |
| |
| .TP |
| .B q[uiet] |
| quietly bind before task runs (default) |
| .IP |
| |
| .TP |
| .B rank |
| bind by task rank (not recommended) |
| .IP |
| |
| .TP |
| .B sort |
| sort free cache pages (run zonesort on Intel KNL nodes) |
| .IP |
| |
| .TP |
| .B v[erbose] |
| verbosely report binding before task runs |
| .RE |
| .IP |
| |
| .TP |
| \fB\-\-mem\-per\-cpu\fR=<\fIsize\fR>[\fIunits\fR] |
| Minimum memory required per usable allocated CPU. |
| Default units are megabytes. |
| The default value is \fBDefMemPerCPU\fR and the maximum value is |
| \fBMaxMemPerCPU\fR (see exception below). If configured, both parameters can be |
| seen using the \fBscontrol show config\fR command. |
| Note that if the job's \fB\-\-mem\-per\-cpu\fR value exceeds the configured |
| \fBMaxMemPerCPU\fR, then the user's limit will be treated as a memory limit |
| per task; \fB\-\-mem\-per\-cpu\fR will be reduced to a value no larger than |
| \fBMaxMemPerCPU\fR; \fB\-\-cpus\-per\-task\fR will be set and the value of |
| \fB\-\-cpus\-per\-task\fR multiplied by the new \fB\-\-mem\-per\-cpu\fR |
| value will equal the original \fB\-\-mem\-per\-cpu\fR value specified by |
| the user. |
| This parameter would generally be used if individual processors |
| are allocated to jobs (\fBSelectType=select/cons_tres\fR). |
| If resources are allocated by core, socket, or whole nodes, then the number |
| of CPUs allocated to a job may be higher than the task count and the value |
| of \fB\-\-mem\-per\-cpu\fR should be adjusted accordingly. |
| Also see \fB\-\-mem\fR and \fB\-\-mem\-per\-gpu\fR. |
| The \fB\-\-mem\fR, \fB\-\-mem\-per\-cpu\fR and \fB\-\-mem\-per\-gpu\fR |
| options are mutually exclusive. |
| |
| \fBNOTE\fR: If the final amount of memory requested by a job |
| can't be satisfied by any of the nodes configured in the |
| partition, the job will be rejected. |
| This could happen if \fB\-\-mem\-per\-cpu\fR is used with the |
| \fB\-\-exclusive\fR option for a job allocation and \fB\-\-mem\-per\-cpu\fR |
| times the number of CPUs on a node is greater than the total memory of that |
| node. |
| |
| \fBNOTE\fR: This applies to \fBusable\fR allocated CPUs in a job allocation. |
| This is important when more than one thread per core is configured. |
| If a job requests \-\-threads\-per\-core with fewer threads on a core than |
| exist on the core (or \-\-hint=nomultithread which implies |
| \-\-threads\-per\-core=1), the job will be unable to use those extra threads on |
| the core and those threads will not be included in the memory per CPU |
| calculation. But if the job has access to all threads on the core, those threads |
| will be included in the memory per CPU calculation even if the job did not |
| explicitly request those threads. |
| |
| In the following examples, each core has two threads. |
| |
| In this first example, two tasks can run on separate hyperthreads |
| in the same core because \-\-threads\-per\-core is not used. The |
| third task uses both threads of the second core. The allocated |
| memory per cpu includes all threads: |
| |
| .nf |
| .ft B |
| $ salloc \-n3 \-\-mem\-per\-cpu=100 |
| salloc: Granted job allocation 17199 |
| $ sacct \-j $SLURM_JOB_ID \-X \-o jobid%7,reqtres%35,alloctres%35 |
| JobID ReqTRES AllocTRES |
| \-\-\-\-\-\-\- \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- |
| 17199 billing=3,cpu=3,mem=300M,node=1 billing=4,cpu=4,mem=400M,node=1 |
| .ft |
| .fi |
| |
| In this second example, because of \-\-threads\-per\-core=1, each |
| task is allocated an entire core but is only able to use one |
| thread per core. Allocated CPUs includes all threads on each |
| core. However, allocated memory per cpu includes only the |
| usable thread in each core. |
| |
| .nf |
| .ft B |
| $ salloc \-n3 \-\-mem\-per\-cpu=100 \-\-threads\-per\-core=1 |
| salloc: Granted job allocation 17200 |
| $ sacct \-j $SLURM_JOB_ID \-X \-o jobid%7,reqtres%35,alloctres%35 |
| JobID ReqTRES AllocTRES |
| \-\-\-\-\-\-\- \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- |
| 17200 billing=3,cpu=3,mem=300M,node=1 billing=6,cpu=6,mem=300M,node=1 |
| .ft |
| .fi |
| .IP |
| |
| .TP |
| \fB\-\-mem\-per\-gpu\fR=<\fIsize\fR>[\fIunits\fR] |
| Minimum memory required per allocated GPU. |
| Default units are megabytes. |
| Different units can be specified using the suffix [K|M|G|T]. |
| Default value is \fBDefMemPerGPU\fR and is available on both a global and |
| per partition basis. |
| If configured, the parameters can be seen using the \fBscontrol show config\fR |
| and \fBscontrol show partition\fR commands. |
| Also see \fB\-\-mem\fR. |
| The \fB\-\-mem\fR, \fB\-\-mem\-per\-cpu\fR and \fB\-\-mem\-per\-gpu\fR |
| options are mutually exclusive. |
| .IP |
| |
| .TP |
| \fB\-\-mincpus\fR=<\fIn\fR> |
| Specify a minimum number of logical cpus/processors per node. |
| .IP |
| |
| .TP |
| \fB\-\-network\fR=<\fItype\fR> |
| Specify information pertaining to the switch or network. |
| The interpretation of \fItype\fR is system dependent. |
| It is used to request using Network Performance Counters. |
| Only one value per request is valid. |
| All options are case in\-sensitive. |
| In this configuration supported values include: |
| .IP |
| .RS |
| .TP 6 |
| \fBsystem\fR |
| Use the system\-wide network performance counters. Only nodes requested |
| will be marked in use for the job allocation. If the job does not |
| fill up the entire system the rest of the nodes are not |
| able to be used by other jobs using NPC, if idle their state will appear as |
| PerfCnts. These nodes are still available for other jobs not using NPC. |
| .IP |
| |
| .TP |
| \fBblade\fR |
| Use the blade network performance counters. Only nodes requested |
| will be marked in use for the job allocation. If the job does not |
| fill up the entire blade(s) allocated to the job those blade(s) are not |
| able to be used by other jobs using NPC, if idle their state will appear as |
| PerfCnts. These nodes are still available for other jobs not using NPC. |
| .RE |
| .IP |
| |
| In all cases the job allocation request \fBmust specify the |
| \-\-exclusive option\fR. Otherwise the request will be denied. |
| |
| Also with any of these options steps are not allowed to share blades, |
| so resources would remain idle inside an allocation if the step |
| running on a blade does not take up all the nodes on the blade. |
| |
| The \fBnetwork\fR option is also available on systems with HPE Slingshot |
| networks. It can be used to request a job VNI (to be used for communication |
| between job steps in a job). It also can be used to override the default |
| network resources allocated for the job step. Multiple values may be specified |
| in a comma-separated list. |
| .IP |
| .RS |
| .TP 6 |
| \fBtcs\fR=<\fIclass1\fR>[:<\fIclass2\fR>]... |
| Set of traffic classes to configure for applications. |
| Supported traffic classes are DEDICATED_ACCESS, LOW_LATENCY, BULK_DATA, and |
| BEST_EFFORT. The traffic classes may also be specified as TC_DEDICATED_ACCESS, |
| TC_LOW_LATENCY, TC_BULK_DATA, and TC_BEST_EFFORT. |
| .IP |
| |
| .TP |
| \fBno_vni\fR |
| Don't allocate any VNIs for this job (even if multi-node). |
| .IP |
| |
| .TP |
| \fBjob_vni\fR |
| Allocate a job VNI for this job. |
| .IP |
| |
| .TP |
| \fBsingle_node_vni\fR |
| Allocate a job VNI for this job, even if it is a single-node job. |
| .IP |
| |
| .TP |
| \fBadjust_limits\fR |
| If set, slurmd will set an upper bound on network resource reservations |
| by taking the per-NIC maximum resource quantity and subtracting the |
| reserved or used values (whichever is higher) for any system network services; |
| this is the default. |
| .IP |
| |
| .TP |
| \fBno_adjust_limits\fR |
| If set, slurmd will calculate network resource reservations |
| based only upon the per-resource configuration default and number of tasks |
| in the application; it will not set an upper bound on those reservation |
| requests based on resource usage of already-existing system network services. |
| Setting this will mean more application launches could fail based |
| on network resource exhaustion, but if the application |
| absolutely needs a certain amount of resources to function, this option |
| will ensure that. |
| .IP |
| |
| .TP |
| \fBdisable_rdzv_get\fR |
| Disable rendezvous gets in Slingshot NICs, which can improve performance for |
| certain applications. |
| .IP |
| |
| .TP |
| \fBdef_<rsrc>\fR=<\fIval\fR> |
| Per-CPU reserved allocation for this resource. |
| .IP |
| |
| .TP |
| \fBres_<rsrc>\fR=<\fIval\fR> |
| Per-node reserved allocation for this resource. |
| If set, overrides the per-CPU allocation. |
| .IP |
| |
| .TP |
| \fBmax_<rsrc>\fR=<\fIval\fR> |
| Maximum per-node limit for this resource. |
| .IP |
| |
| .TP |
| \fBdepth\fR=<\fIdepth\fR> |
| Multiplier for per-CPU resource allocation. |
| Default is the number of reserved CPUs on the node. |
| .RE |
| .IP |
| |
| The resources that may be requested are: |
| .IP |
| .RS |
| .TP 6 |
| \fBtxqs\fR |
| Transmit command queues. The default is 2 per-CPU, maximum 1024 per-node. |
| .IP |
| |
| .TP |
| \fBtgqs\fR |
| Target command queues. The default is 1 per-CPU, maximum 512 per-node. |
| .IP |
| |
| .TP |
| \fBeqs\fR |
| Event queues. The default is 2 per-CPU, maximum 2047 per-node. |
| .IP |
| |
| .TP |
| \fBcts\fR |
| Counters. The default is 1 per-CPU, maximum 2047 per-node. |
| .IP |
| |
| .TP |
| \fBtles\fR |
| Trigger list entries. The default is 1 per-CPU, maximum 2048 per-node. |
| .IP |
| |
| .TP |
| \fBptes\fR |
| Portable table entries. The default is 6 per-CPU, maximum 2048 per-node. |
| .IP |
| |
| .TP |
| \fBles\fR |
| List entries. The default is 16 per-CPU, maximum 16384 per-node. |
| .IP |
| |
| .TP |
| \fBacs\fR |
| Addressing contexts. The default is 2 per-CPU, maximum 1022 per-node. |
| .RE |
| .IP |
| |
| .IP |
| |
| .TP |
| \fB\-\-nice\fR[=\fIadjustment\fR] |
| Run the job with an adjusted scheduling priority within Slurm. With no |
| adjustment value the scheduling priority is decreased by 100. A negative nice |
| value increases the priority, otherwise decreases it. The adjustment range is |
| +/\- 2147483645. Only privileged users can specify a negative adjustment. |
| .IP |
| |
| .TP |
| \fB\-k\fR, \fB\-\-no\-kill\fR[=off] |
| Do not automatically terminate a job if one of the nodes it has been |
| allocated fails. The user will assume the responsibilities for fault\-tolerance |
| should a node fail. |
| The job allocation will not be revoked so the user may launch new |
| job steps on the remaining nodes in their allocation. |
| This option does not set the \fBSLURM_NO_KILL\fR environment variable. |
| Therefore, when a node fails, steps running on that node will be killed unless |
| the \fBSLURM_NO_KILL\fR environment variable was explicitly set or srun calls |
| within the job allocation explicitly requested \-\-no\-kill. |
| |
| Specify an optional argument of "off" to disable the effect of the |
| \fBSBATCH_NO_KILL\fR environment variable. |
| |
| By default Slurm terminates the entire job allocation if any node fails in its |
| range of allocated nodes. |
| .IP |
| |
| .TP |
| \fB\-\-no\-requeue\fR |
| Specifies that the batch job should never be requeued under any circumstances |
| (see note below). |
| Setting this option will prevent system administrators from being able |
| to restart the job (for example, after a scheduled downtime), recover from |
| a node failure, or be requeued upon preemption by a higher priority job. |
| When a job is requeued, the batch script is initiated from its beginning. |
| Also see the \fB\-\-requeue\fR option. |
| The \fIJobRequeue\fR configuration parameter controls the default |
| behavior on the cluster. |
| |
| \fBNOTE\fR: \fBForceRequeueOnFail\fR if set as an option to the PrologFlags |
| parameter in slurm.conf can override this setting. |
| .IP |
| |
| .TP |
| \fB\-F\fR, \fB\-\-nodefile\fR=<\fInode_file\fR> |
| Much like \fB\-\-nodelist\fR, but the list is contained in a file of name |
| \fInode file\fR. The node names of the list may also span multiple lines |
| in the file. Duplicate node names in the file will be ignored. |
| The order of the node names in the list is not important; the node names |
| will be sorted by Slurm. |
| .IP |
| |
| .TP |
| \fB\-w\fR, \fB\-\-nodelist\fR=<\fInode_name_list\fR> |
| Request a specific list of hosts. |
| The job will contain \fIall\fR of these hosts and possibly additional hosts |
| as needed to satisfy resource requirements. |
| The list may be specified as a comma\-separated list of hosts, a range of hosts |
| (host[1\-5,7,...] for example), or a filename. |
| The host list will be assumed to be a filename if it contains a "/" character. |
| If you specify a minimum node or processor count larger than can be satisfied |
| by the supplied host list, additional resources will be allocated on other |
| nodes as needed. |
| Duplicate node names in the list will be ignored. |
| The order of the node names in the list is not important; the node names |
| will be sorted by Slurm. |
| .IP |
| |
| .TP |
| \fB\-N\fR, \fB\-\-nodes\fR=<\fIminnodes\fR>[\-\fImaxnodes\fR]|<\fIsize_string\fR> |
| Request that a minimum of \fIminnodes\fR nodes be allocated to this job. |
| A maximum node count may also be specified with \fImaxnodes\fR. |
| If only one number is specified, this is used as both the minimum and |
| maximum node count. Node count can be also specified as size_string. |
| The size_string specification identifies what nodes values should be used. |
| Multiple values may be specified using a comma separated list or |
| with a step function by suffix containing a colon and |
| number values with a "-" separator. |
| For example, "--nodes=1-15:4" is equivalent to "--nodes=1,5,9,13". |
| The partition's node limits supersede those of the job. |
| If a job's node limits are outside of the range permitted for its |
| associated partition, the job will be left in a PENDING state. |
| This permits possible execution at a later time, when the partition |
| limit is changed. |
| If a job node limit exceeds the number of nodes configured in the |
| partition, the job will be rejected. |
| Note that the environment |
| variable \fBSLURM_JOB_NUM_NODES\fR will be set to the count of nodes actually |
| allocated to the job. See the \fBENVIRONMENT VARIABLES \fR section |
| for more information. If \fB\-N\fR is not specified, the default |
| behavior is to allocate enough nodes to satisfy the requested resources as |
| expressed by per\-job specification options, e.g. \fB\-n\fR, \fB\-c\fR and |
| \fB--gpus\fR. |
| The job will be allocated as many nodes as possible within the range specified |
| and without delaying the initiation of the job. |
| The node count specification may include a numeric value followed by a suffix |
| of "k" (multiplies numeric value by 1,024) or "m" (multiplies numeric value by |
| 1,048,576). |
| |
| \fBNOTE\fR: This option cannot be used in with arbitrary distribution. |
| .IP |
| |
| .TP |
| \fB\-n\fR, \fB\-\-ntasks\fR=<\fInumber\fR> |
| sbatch does not launch tasks, it requests an allocation of resources and |
| submits a batch script. This option advises the Slurm controller that job |
| steps run within the allocation will launch a maximum of \fInumber\fR |
| tasks and to provide for sufficient resources. |
| The default is one task per node, but note |
| that the \fB\-\-cpus\-per\-task\fR option will change this default. |
| .IP |
| |
| .TP |
| \fB\-\-ntasks\-per\-core\fR=<\fIntasks\fR> |
| Request the maximum \fIntasks\fR be invoked on each core. |
| Meant to be used with the \fB\-\-ntasks\fR option. |
| Related to \fB\-\-ntasks\-per\-node\fR except at the core level |
| instead of the node level. This option will be inherited by srun. |
| Slurm may allocate more cpus than what was requested in order to respect this |
| option. |
| .br |
| \fBNOTE\fR: This option is not supported when using |
| \fISelectType=select/linear\fR. This value can not be greater than |
| \fB\-\-threads\-per\-core\fR. |
| .IP |
| |
| .TP |
| \fB\-\-ntasks\-per\-gpu\fR=<\fIntasks\fR> |
| Request that there are \fIntasks\fR tasks invoked for every GPU. |
| This option can work in two ways: 1) either specify \fB\-\-ntasks\fR in |
| addition, in which case a type\-less GPU specification will be automatically |
| determined to satisfy \fB\-\-ntasks\-per\-gpu\fR, or 2) specify the GPUs wanted |
| (e.g. via \fB\-\-gpus\fR or \fB\-\-gres\fR) without specifying \fB\-\-ntasks\fR, |
| and the total task count will be automatically determined. |
| The number of CPUs needed will be automatically increased if necessary to allow |
| for any calculated task count. |
| This option will implicitly set \fB\-\-tres\-bind=gres/gpu:single:<ntasks>\fR, |
| but that can be overridden with an explicit \fB\-\-tres\-bind=gres/gpu\fR |
| specification. |
| This option is not compatible with a node range |
| (i.e. \-N<\fIminnodes\fR\-\fImaxnodes\fR>). |
| This option is not compatible with \fB\-\-gpus\-per\-task\fR, |
| \fB\-\-gpus\-per\-socket\fR, or \fB\-\-ntasks\-per\-node\fR. |
| This option is not supported unless \fISelectType=cons_tres\fR is |
| configured (either directly or indirectly on Cray systems). |
| .IP |
| |
| .TP |
| \fB\-\-ntasks\-per\-node\fR=<\fIntasks\fR> |
| Request that \fIntasks\fR be invoked on each node. |
| If used with the \fB\-\-ntasks\fR option, the \fB\-\-ntasks\fR option will take |
| precedence and the \fB\-\-ntasks\-per\-node\fR will be treated as a |
| \fImaximum\fR count of tasks per node. |
| Meant to be used with the \fB\-\-nodes\fR option. |
| This is related to \fB\-\-cpus\-per\-task\fR=\fIncpus\fR, |
| but does not require knowledge of the actual number of cpus on |
| each node. In some cases, it is more convenient to be able to |
| request that no more than a specific number of tasks be invoked |
| on each node. Examples of this include submitting |
| a hybrid MPI/OpenMP app where only one MPI "task/rank" should be |
| assigned to each node while allowing the OpenMP portion to utilize |
| all of the parallelism present in the node, or submitting a single |
| setup/cleanup/monitoring job to each node of a pre\-existing |
| allocation as one step in a larger job script. |
| .IP |
| |
| .TP |
| \fB\-\-ntasks\-per\-socket\fR=<\fIntasks\fR> |
| Request the maximum \fIntasks\fR be invoked on each socket. |
| Meant to be used with the \fB\-\-ntasks\fR option. |
| Related to \fB\-\-ntasks\-per\-node\fR except at the socket level |
| instead of the node level. |
| \fBNOTE\fR: This option is not supported when using |
| \fISelectType=select/linear\fR. |
| .IP |
| |
| .TP |
| \fB\-\-oom\-kill\-step\fR[={0|1}] |
| Whether to kill the entire step if an OOM event is detected in any task of a |
| step. This overwrites the "OOMKillStep" setting in TaskPluginParam from |
| slurm.conf. When unset it will use the setting in slurm.conf. When set, a value |
| of "0" will disable killing the entire step, while a value of "1" will enable |
| it. This applies to the entire allocation except for the external step. |
| Default is "1" (enabled) when the option is found with no value. |
| .IP |
| |
| .TP |
| \fB\-\-open\-mode\fR={append|truncate} |
| Open the output and error files using append or truncate mode as specified. |
| The default value is specified by the system configuration parameter |
| \fIJobFileAppend\fR. |
| .IP |
| |
| .TP |
| \fB\-o\fR, \fB\-\-output\fR=<\fIfilename_pattern\fR> |
| Instruct Slurm to connect the batch script's standard output directly to the |
| file name specified in the "\fIfilename pattern\fR". |
| By default both standard output and standard error are directed to the same file. |
| For job arrays, the default file name is "slurm\-%A_%a.out", "%A" is replaced |
| by the job ID and "%a" with the array index. |
| For other jobs, the default file name is "slurm\-%j.out", where the "%j" is |
| replaced by the job ID. |
| See the \fBfilename pattern\fR section below for filename specification options. |
| .IP |
| |
| .TP |
| \fB\-O\fR, \fB\-\-overcommit\fR |
| Overcommit resources. |
| |
| When applied to a job allocation (not including jobs requesting exclusive |
| access to the nodes) the resources are allocated as if only one task per |
| node is requested. This means that the requested number of cpus per task |
| (\fB\-c\fR, \fB\-\-cpus\-per\-task\fR) are allocated per node rather than |
| being multiplied by the number of tasks. Options used to specify the number |
| of tasks per node, socket, core, etc. are ignored. |
| |
| When applied to job step allocations (the \fBsrun\fR command when executed |
| within an existing job allocation), this option can be used to launch more than |
| one task per CPU. |
| Normally, \fBsrun\fR will not allocate more than one process per CPU. |
| By specifying \fB\-\-overcommit\fR you are explicitly allowing more than one |
| process per CPU. However no more than \fBMAX_TASKS_PER_NODE\fR tasks are |
| permitted to execute per node. \fBNOTE\fR: \fBMAX_TASKS_PER_NODE\fR is |
| defined in the file \fIslurm.h\fR and is not a variable, it is set at |
| Slurm build time. |
| .IP |
| |
| .TP |
| \fB\-s\fR, \fB\-\-oversubscribe\fR |
| The job allocation can over\-subscribe resources with other running jobs. |
| The resources to be over\-subscribed can be nodes, sockets, cores, and/or |
| hyperthreads depending upon configuration. |
| The default over\-subscribe behavior depends on system configuration and the |
| partition's \fBOverSubscribe\fR option takes precedence over the job's option. |
| This option may result in the allocation being granted sooner than if the |
| \-\-oversubscribe option was not set and allow higher system utilization, but |
| application performance will likely suffer due to competition for resources. |
| Also see the \-\-exclusive option. |
| |
| \fBNOTE\fR: This option is mutually exclusive with \fB\-\-exclusive\fR. |
| .IP |
| |
| .TP |
| \fB\-\-parsable\fR |
| Outputs only the job id number and the cluster name if present. |
| The values are separated by a semicolon. Errors will still be displayed. |
| .IP |
| |
| .TP |
| \fB\-p\fR, \fB\-\-partition\fR=<\fIpartition_names\fR> |
| Request a specific partition for the resource allocation. If not specified, |
| the default behavior is to allow the slurm controller to select the default |
| partition as designated by the system administrator. If the job can use more |
| than one partition, specify their names in a comma separate list and the one |
| offering earliest initiation will be used with no regard given to the partition |
| name ordering (although higher priority partitions will be considered first). |
| When the job is initiated, the name of the partition used will be placed first |
| in the job record partition string. |
| .IP |
| |
| .TP |
| \fB\-\-prefer\fR=<\fIlist\fR> |
| Nodes can have \fBfeatures\fR assigned to them by the Slurm administrator. |
| Users can specify which of these \fBfeatures\fR are desired but not required by |
| their job using the prefer option. |
| This option operates independently from \fB\-\-constraint\fR and will override |
| whatever is set there if possible. |
| When scheduling, the features in \fB\-\-prefer\fR are tried first. If a node set |
| isn't available with those features then \fB\-\-constraint\fR is attempted. |
| See \fB\-\-constraint\fR for more information, this option behaves the same |
| way. |
| |
| .TP |
| \fB\-\-priority\fR=<\fIvalue\fR> |
| Request a specific job priority. |
| May be subject to configuration specific constraints. |
| \fIvalue\fR should either be a numeric value or "TOP" (for highest possible value). |
| Only Slurm operators and administrators can set the priority of a job. |
| .IP |
| |
| .TP |
| \fB\-\-profile\fR={all|none|<\fItype\fR>[,<\fItype\fR>...]} |
| Enables detailed data collection by the acct_gather_profile plugin. |
| Detailed data are typically time\-series that are stored in an HDF5 file for |
| the job or an InfluxDB database depending on the configured plugin. |
| .IP |
| .RS |
| .TP 10 |
| \fBAll\fR |
| All data types are collected. (Cannot be combined with other values.) |
| .IP |
| |
| .TP |
| \fBNone\fR |
| No data types are collected. This is the default. |
| (Cannot be combined with other values.) |
| .IP |
| .RE |
| |
| Valid \fItype\fR values are: |
| .IP |
| .RS |
| .TP |
| \fBEnergy\fR |
| Energy data is collected. |
| .IP |
| |
| .TP |
| \fBTask\fR |
| Task (I/O, Memory, ...) data is collected. |
| .IP |
| |
| .TP |
| \fBLustre\fR |
| Lustre data is collected. |
| .IP |
| |
| .TP |
| \fBNetwork\fR |
| Network (InfiniBand) data is collected. |
| .RE |
| .IP |
| |
| .TP |
| \fB\-\-propagate\fR[=\fIrlimit\fR[,\fIrlimit\fR...]] |
| Allows users to specify which of the modifiable (soft) resource limits |
| to propagate to the compute nodes and apply to their jobs. If no |
| \fIrlimit\fR is specified, then all resource limits will be propagated. |
| The following rlimit names are supported by Slurm (although some |
| options may not be supported on some systems): |
| .IP |
| .RS |
| .TP 10 |
| \fBALL\fR |
| All limits listed below (default) |
| .IP |
| |
| .TP |
| \fBNONE\fR |
| No limits listed below |
| .IP |
| |
| .TP |
| \fBAS\fR |
| The maximum address space (virtual memory) for a process. |
| .IP |
| |
| .TP |
| \fBCORE\fR |
| The maximum size of core file |
| .IP |
| |
| .TP |
| \fBCPU\fR |
| The maximum amount of CPU time |
| .IP |
| |
| .TP |
| \fBDATA\fR |
| The maximum size of a process's data segment |
| .IP |
| |
| .TP |
| \fBFSIZE\fR |
| The maximum size of files created. Note that if the user sets FSIZE to less |
| than the current size of the slurmd.log, job launches will fail with |
| a 'File size limit exceeded' error. |
| .IP |
| |
| .TP |
| \fBMEMLOCK\fR |
| The maximum size that may be locked into memory |
| .IP |
| |
| .TP |
| \fBNOFILE\fR |
| The maximum number of open files |
| .IP |
| |
| .TP |
| \fBNPROC\fR |
| The maximum number of processes available |
| .IP |
| |
| .TP |
| \fBRSS\fR |
| The maximum resident set size. Note that this only has effect with Linux |
| kernels 2.4.30 or older or BSD. |
| .IP |
| |
| .TP |
| \fBSTACK\fR |
| The maximum stack size |
| .RE |
| .IP |
| |
| .TP |
| \fB\-q\fR, \fB\-\-qos\fR=<\fIqos\fR> |
| Request a quality of service for the job, or comma separated list of QOS. |
| If requesting a list it will be ordered based on the priority of the QOS given |
| with the first being the highest priority. |
| QOS values can be defined |
| for each user/cluster/account association in the Slurm database. |
| Users will be limited to their association's defined set of qos's when |
| the Slurm configuration parameter, AccountingStorageEnforce, includes |
| "qos" in its definition. |
| .IP |
| |
| .TP |
| \fB\-Q\fR, \fB\-\-quiet\fR |
| Suppress informational messages from sbatch such as Job ID. Only errors will |
| still be displayed. |
| .IP |
| |
| .TP |
| \fB\-\-reboot\fR |
| Force the allocated nodes to reboot before starting the job. |
| This is only supported with some system configurations and will otherwise be |
| silently ignored. Only root, \fISlurmUser\fR or admins can reboot nodes. |
| .IP |
| |
| .TP |
| \fB\-\-requeue\fR |
| Specifies that the batch job should be eligible for requeuing. |
| The job may be requeued explicitly by a system administrator, after node |
| failure, or upon preemption by a higher priority job. |
| When a job is requeued, the batch script is initiated from its beginning with |
| the same job ID. Also see the \fB\-\-no\-requeue\fR option. |
| The \fIJobRequeue\fR configuration parameter controls the default |
| behavior on the cluster. |
| .IP |
| |
| .TP |
| \fB\-\-reservation\fR=<\fIreservation_names\fR> |
| Allocate resources for the job from the named reservation. If the job can use |
| more than one reservation, specify their names in a comma separate list and the |
| one offering earliest initiation. Each reservation will be considered in the |
| order it was requested. |
| All reservations will be listed in scontrol/squeue through the life of the job. |
| In accounting the first reservation will be seen and after the job starts the |
| reservation used will replace it. |
| .IP |
| |
| .TP |
| \fB\-\-resv\-ports\fR[=\fIcount\fR] |
| Reserve communication ports for this job. Users can specify the number |
| of port they want to reserve. The parameter MpiParams=ports=12000\-12999 |
| must be specified in \fIslurm.conf\fR. If the number of reserved ports is zero |
| then no ports are reserved. Used for native Cray's PMI only. |
| This option can only be used if the slurmstepd step management is enabled. |
| This option applies to job allocations. See \fB\-\-stepmgr\fR. |
| .IP |
| |
| .TP |
| \fB\-\-segment\fR=<\fIsegment_size\fR> |
| When a block topology is used, this defines the size of the segments that |
| will be used to create the job allocation. |
| No requirement would be placed on all segments for a job needing to |
| be placed within the same higher-level block. |
| |
| \fBNOTE\fR: The requested node count must always be evenly divisible by |
| the requested segment size. |
| .IP |
| |
| .TP |
| \fB\-\-signal\fR=[{R|B}:]<\fIsig_num\fR>[@\fIsig_time\fR] |
| When a job is within \fIsig_time\fR seconds of its end time, |
| send it the signal \fIsig_num\fR. |
| Due to the resolution of event handling by Slurm, the signal may |
| be sent up to 60 seconds earlier than specified. |
| \fIsig_num\fR may either be a signal number or name (e.g. "10" or "USR1"). |
| \fIsig_time\fR must have an integer value between 0 and 65535. |
| By default, no signal is sent before the job's end time. |
| If a \fIsig_num\fR is specified without any \fIsig_time\fR, |
| the default time will be 60 seconds. |
| Use the "B:" option to signal only the batch shell, none of the other |
| processes will be signaled. By default all job steps will be signaled, |
| but not the batch shell itself. |
| Use the "R:" option to allow this job to overlap with a reservation with |
| MaxStartDelay set. If the "R:" option is used, preemption must be enabled on the |
| system, and if the job is preempted it will be requeued if allowed otherwise the |
| job will be canceled. |
| To have the signal sent at preemption time see the \fBsend_user_signal\fR |
| \fBPreemptParameter\fR. |
| .IP |
| |
| .TP |
| \fB\-\-sockets\-per\-node\fR=<\fIsockets\fR> |
| Restrict node selection to nodes with at least the specified number of |
| sockets. See additional information under \fB\-B\fR option above when |
| task/affinity plugin is enabled. |
| .br |
| \fBNOTE\fR: This option may implicitly set the number of tasks (if \fB\-n\fR |
| was not specified) as one task per requested thread. |
| .IP |
| |
| .TP |
| \fB\-\-spread\-job\fR |
| Spread the job allocation over as many nodes as possible and attempt to |
| evenly distribute tasks across the allocated nodes. |
| This option disables the topology/tree plugin. |
| .IP |
| |
| .TP |
| \fB\-\-stepmgr\fR |
| Enable slurmstepd step management per\-job if it isn't enabled system wide. |
| This enables job steps to be managed by a single extern slurmstepd associated |
| with the job to manage steps. This is beneficial for jobs that submit many |
| steps inside their allocations. \fBPrologFlags=contain\fR must be set. |
| .IP |
| |
| .TP |
| \fB\-\-switches\fR=<\fIcount\fR>[@\fImax\-time\fR] |
| When a tree topology is used, this defines the maximum count of leaf switches |
| desired for the job allocation and optionally the maximum time to wait |
| for that number of switches. If Slurm finds an allocation containing more |
| switches than the count specified, the job remains pending until it either finds |
| an allocation with desired switch count or the time limit expires. |
| It there is no switch count limit, there is no delay in starting the job. |
| Acceptable time formats include "minutes", "minutes:seconds", |
| "hours:minutes:seconds", "days\-hours", "days\-hours:minutes" and |
| "days\-hours:minutes:seconds". |
| The job's maximum time delay may be limited by the system administrator using |
| the \fBSchedulerParameters\fR configuration parameter with the |
| \fBmax_switch_wait\fR parameter option. |
| On a dragonfly network the only switch count supported is 1 since communication |
| performance will be highest when a job is allocate resources on one leaf switch |
| or more than 2 leaf switches. |
| The default max\-time is the max_switch_wait SchedulerParameters. |
| .IP |
| |
| .TP |
| \fB\-\-test\-only\fR |
| Validate the batch script and return an estimate of when a job would be |
| scheduled to run given the current job queue and all the other arguments |
| specifying the job requirements. No job is actually submitted. |
| .IP |
| |
| .TP |
| \fB\-\-thread\-spec\fR=<\fInum\fR> |
| Count of specialized threads per node reserved by the job for system operations |
| and not used by the application. The application will not use these threads, |
| but will be charged for their allocation. |
| This option can not be used with the \fB\-\-core\-spec\fR option. |
| |
| \fBNOTE\fR: Explicitly setting a job's specialized thread value implicitly sets |
| its --exclusive option, reserving entire nodes for the job. |
| .IP |
| |
| .TP |
| \fB\-\-threads\-per\-core\fR=<\fIthreads\fR> |
| Restrict node selection to nodes with at least the specified number of |
| threads per core. In task layout, use the specified maximum number of threads |
| per core. \fBNOTE\fR: "Threads" refers to the number of processing units on |
| each core rather than the number of application tasks to be launched per core. |
| See additional information under \fB\-B\fR option above when task/affinity |
| plugin is enabled. |
| .br |
| \fBNOTE\fR: This option may implicitly set the number of tasks (if \fB\-n\fR |
| was not specified) as one task per requested thread. |
| .IP |
| |
| .TP |
| \fB\-t\fR, \fB\-\-time\fR=<\fItime\fR> |
| Set a limit on the total run time of the job allocation. If the |
| requested time limit exceeds the partition's time limit, the job will |
| be left in a PENDING state (possibly indefinitely). The default time |
| limit is the partition's default time limit. When the time limit is reached, |
| each task in each job step is sent SIGTERM followed by SIGKILL. The |
| interval between signals is specified by the Slurm configuration |
| parameter \fBKillWait\fR. The \fBOverTimeLimit\fR configuration parameter may |
| permit the job to run longer than scheduled. Time resolution is one minute |
| and second values are rounded up to the next minute. |
| |
| A time limit of zero requests that no time limit be imposed. Acceptable time |
| formats include "minutes", "minutes:seconds", "hours:minutes:seconds", |
| "days\-hours", "days\-hours:minutes" and "days\-hours:minutes:seconds". |
| .IP |
| |
| .TP |
| \fB\-\-time\-min\fR=<\fItime\fR> |
| Set a minimum time limit on the job allocation. |
| If specified, the job may have its \fB\-\-time\fR limit lowered to a value |
| no lower than \fB\-\-time\-min\fR if doing so permits the job to begin |
| execution earlier than otherwise possible. |
| The job's time limit will not be changed after the job is allocated resources. |
| This is performed by a backfill scheduling algorithm to allocate resources |
| otherwise reserved for higher priority jobs. |
| Acceptable time formats include "minutes", "minutes:seconds", |
| "hours:minutes:seconds", "days\-hours", "days\-hours:minutes" and |
| "days\-hours:minutes:seconds". |
| .IP |
| |
| .TP |
| \fB\-\-tmp\fR=<\fIsize\fR>[\fIunits\fR] |
| Specify a minimum amount of temporary disk space per node. |
| Default units are megabytes. |
| Different units can be specified using the suffix [K|M|G|T]. |
| .IP |
| |
| .TP |
| \fB\-\-tres\-bind\fR=<\fItres\fR>:[verbose,]<\fItype\fR>[+<\fItres\fR>: |
| [verbose,]<\fItype\fR>...] |
| Specify a list of tres with their task binding options. Currently gres are the |
| only supported tres for this options. Specify gres as "gres/<gres_name>" |
| (e.g. gres/gpu) |
| |
| Example: \-\-tres\-bind=gres/gpu:verbose,map:0,1,2,3+gres/nic:closest |
| |
| By default, most tres are not bound to individual tasks |
| |
| Supported binding \fItype\fR options for \fBgres\fR: |
| .IP |
| .RS |
| .TP 10 |
| \fBclosest\fR |
| Bind each task to the gres(s) which are closest. |
| In a NUMA environment, each task may be bound to more than one gres (i.e. |
| all gres in that NUMA environment). |
| .IP |
| |
| .TP |
| \fBmap:<list>\fR |
| Bind by setting gres masks on tasks (or ranks) as specified where <list> is |
| <gres_id_for_task_0>,<gres_id_for_task_1>,... gres IDs are interpreted as decimal |
| values. If the number of tasks (or ranks) exceeds the number of elements in this |
| list, elements in the list will be reused as needed starting from the beginning |
| of the list. To simplify support for large task counts, the lists may follow a |
| map with an asterisk and repetition count. For example "map:0*4,1*4". |
| If the task/cgroup plugin is used and ConstrainDevices is set in cgroup.conf, |
| then the gres IDs are zero\-based indexes relative to the gress allocated to the |
| job (e.g. the first gres is 0, even if the global ID is 3). Otherwise, the gres |
| IDs are global IDs, and all gres on each node in the job should be allocated for |
| predictable binding results. |
| .IP |
| |
| .TP |
| \fBmask:<list>\fR |
| Bind by setting gres masks on tasks (or ranks) as specified where <list> is |
| <gres_mask_for_task_0>,<gres_mask_for_task_1>,... The mapping is specified for |
| a node and identical mapping is applied to the tasks on every node (i.e. the |
| lowest task ID on each node is mapped to the first mask specified in the list, |
| etc.). gres masks are always interpreted as hexadecimal values but can be |
| preceded with an optional '0x'. To simplify support for large task counts, the |
| lists may follow a map with an asterisk and repetition count. |
| For example "mask:0x0f*4,0xf0*4". |
| If the task/cgroup plugin is used and ConstrainDevices is set in cgroup.conf, |
| then the gres IDs are zero\-based indexes relative to the gres allocated to the |
| job (e.g. the first gres is 0, even if the global ID is 3). Otherwise, the gres |
| IDs are global IDs, and all gres on each node in the job should be allocated for |
| predictable binding results. |
| .IP |
| |
| .TP |
| \fBnone\fR |
| Do not bind tasks to this gres (turns off implicit binding from |
| \-\-tres\-per\-task and \-\-gpus\-per\-task). |
| .IP |
| |
| .TP |
| \fBper_task:<gres_per_task>\fR |
| Each task will be bound to the number of gres specified in |
| \fI<gres_per_task>\fR. Tasks are preferentially assigned gres with affinity to |
| cores in their allocation like in \fIclosest\fR, though they will |
| take any gres if they are unavailable. If no affinity exists, the first task |
| will be assigned the first x number of gres on the node etc. |
| Shared gres will prefer to bind one sharing device per task if possible. |
| .IP |
| |
| .TP |
| \fBsingle:<tasks_per_gres>\fR |
| Like \fIclosest\fR, except that each task can only be bound to a |
| single gres, even when it can be bound to multiple gres that are equally close. |
| The gres to bind to is determined by \fI<tasks_per_gres>\fR, where the |
| first \fI<tasks_per_gres>\fR tasks are bound to the first gres available, the |
| second \fI<tasks_per_gres>\fR tasks are bound to the second gres available, etc. |
| This is basically a block distribution of tasks onto available gres, where the |
| available gres are determined by the socket affinity of the task and the socket |
| affinity of the gres as specified in gres.conf's \fICores\fR parameter. |
| .IP |
| |
| \fBNOTE\fR: Shared gres binding is currently limited to per_task or none |
| .RE |
| .IP |
| |
| .TP |
| \fB\-\-tres\-per\-task\fR=<\fIlist\fR> |
| Specifies a comma\-delimited list of trackable resources required for the job on |
| each task to be spawned in the job's resource allocation. |
| The format for each entry in the list is "trestype[/tresname]=count". |
| The \fItrestype\fR is the type of trackable resource requested (e.g. cpu, gres, |
| license, etc). |
| The \fItresname\fR is the name of the trackable resource, as can be seen with |
| \fIsacctmgr show tres\fR. This is required when it exists for tres types such |
| as gres, license, etc. (e.g. gpu, gpu:a100). |
| In order to request a license with this option, the license(s) must be defined |
| in the \fBAccountingStorageTRES\fR parameter of slurm.conf. |
| The \fIcount\fR is the number of those resources. |
| .br |
| The count can have a suffix of |
| .br |
| "k" or "K" (multiple of 1024), |
| .br |
| "m" or "M" (multiple of 1024 x 1024), |
| .br |
| "g" or "G" (multiple of 1024 x 1024 x 1024), |
| .br |
| "t" or "T" (multiple of 1024 x 1024 x 1024 x 1024), |
| .br |
| "p" or "P" (multiple of 1024 x 1024 x 1024 x 1024 x 1024). |
| .br |
| Examples: |
| .nf |
| \-\-tres\-per\-task=cpu=4 |
| \-\-tres\-per\-task=cpu=8,license/ansys=1 |
| \-\-tres\-per\-task=gres/gpu=1 |
| \-\-tres\-per\-task=gres/gpu:a100=2 |
| .fi |
| The specified resources will be allocated to the job on each node. |
| The available trackable resources are configurable by the system |
| administrator. |
| .br |
| \fBNOTE\fR: This option with gres/gpu or gres/shard will implicitly set |
| \-\-tres\-bind=gres/[gpu|shard]:per_task:<tres_per_task>, or if multiple gpu |
| types are specified \-\-tres\-bind=gres/gpu:per_task:<gpus_per_task_type_sum>. |
| This can be overridden with an explicit \-\-tres\-bind specification. |
| .br |
| \fBNOTE\fR: Invalid TRES for \-\-tres\-per\-task include |
| bb,billing,energy,fs,mem,node,pages,vmem. |
| .br |
| .IP |
| |
| .TP |
| \fB\-\-uid\fR=<\fIuser\fR> |
| Attempt to submit and/or run a job as \fIuser\fR instead of the |
| invoking user id. The invoking user's credentials will be used |
| to check access permissions for the target partition. User root |
| may use this option to run jobs as a normal user in a RootOnly |
| partition for example. If run as root, \fBsbatch\fR will drop |
| its permissions to the uid specified after node allocation is |
| successful. \fIuser\fR may be the user name or numerical user ID. |
| .IP |
| |
| .TP |
| \fB\-\-usage\fR |
| Display brief help message and exit. |
| .IP |
| |
| .TP |
| \fB\-\-use\-min\-nodes\fR |
| If a range of node counts is given, prefer the smaller count. |
| .IP |
| |
| .TP |
| \fB\-v\fR, \fB\-\-verbose\fR |
| Increase the verbosity of sbatch's informational messages. Multiple |
| '\fB\-v\fR's will further increase sbatch's verbosity. By default only |
| errors will be displayed. |
| .IP |
| |
| .TP |
| \fB\-V\fR, \fB\-\-version\fR |
| Display version information and exit. |
| .IP |
| |
| .TP |
| \fB\-W\fR, \fB\-\-wait\fR |
| Do not exit until the submitted job terminates. |
| The exit code of the sbatch command will be the same as the exit code |
| of the submitted job. If the job terminated due to a signal rather than a |
| normal exit, the exit code will be set to 1. |
| In the case of a job array, the exit code recorded will be the highest value |
| for any task in the job array. |
| .IP |
| |
| .TP |
| \fB\-\-wait\-all\-nodes\fR=<\fIvalue\fR> |
| Controls when the execution of the command begins. |
| By default the job will begin execution as soon as the allocation is made. |
| .IP |
| .RS |
| .TP 5 |
| 0 |
| Begin execution as soon as allocation can be made. |
| Do not wait for all nodes to be ready for use (i.e. booted). |
| .IP |
| |
| .TP |
| 1 |
| Do not begin execution until all nodes are ready for use. |
| .RE |
| .IP |
| |
| .TP |
| \fB\-\-wckey\fR=<\fIwckey\fR> |
| Specify wckey to be used with job. If TrackWCKey=no (default) in the |
| slurm.conf this value is ignored. |
| .IP |
| |
| .TP |
| \fB\-\-wrap\fR=<\fIcommand_string\fR> |
| Sbatch will wrap the specified command string in a simple "sh" shell script, |
| and submit that script to the slurm controller. When \-\-wrap is used, |
| a script name and arguments may not be specified on the command line; instead |
| the sbatch\-generated wrapper script is used. |
| .IP |
| |
| .SH "FILENAME PATTERN" |
| .PP |
| \fBsbatch\fR allows for a filename pattern to contain one or more replacement |
| symbols, which are a percent sign "%" followed by a letter (e.g. %j). |
| |
| .TP |
| \fB\\\\\fR |
| Do not process any of the replacement symbols. |
| .IP |
| |
| .TP |
| \fB%%\fR |
| The character "%". |
| .IP |
| |
| .TP |
| \fB%A\fR |
| Job array's master job allocation number. |
| .IP |
| |
| .TP |
| \fB%a\fR |
| Job array ID (index) number. |
| .IP |
| |
| .TP |
| \fB%b\fR |
| Job array ID (index) number modulo 10. |
| .IP |
| |
| .TP |
| \fB%J\fR |
| jobid.stepid of the running job (e.g. "128.0"). The stepid is only expanded for |
| regular steps, not for special steps like "batch" or "extern". |
| .IP |
| |
| .TP |
| \fB%j\fR |
| jobid of the running job. |
| .IP |
| |
| .TP |
| \fB%N\fR |
| short hostname. This will create a separate IO file per node. |
| .IP |
| |
| .TP |
| \fB%n\fR |
| Node identifier relative to current job (e.g. "0" is the first node of |
| the running job) This will create a separate IO file per node. |
| .IP |
| |
| .TP |
| \fB%s\fR |
| stepid of the running job. |
| .IP |
| |
| .TP |
| \fB%t\fR |
| task identifier (rank) relative to current job. This will create a |
| separate IO file per task. |
| .IP |
| |
| .TP |
| \fB%u\fR |
| User name. |
| .IP |
| |
| .TP |
| \fB%x\fR |
| Job name. |
| .IP |
| |
| .PP |
| A number placed between the percent character and format specifier may be |
| used to zero\-pad the result in the IO filename to at minimum of specified |
| numbers. This number is ignored if the format specifier corresponds to |
| non\-numeric data (%N for example). The maximal number is 10, if a value greater |
| than 10 is used the result is padding up to 10 characters. |
| Some examples of how the format string may be used for a 4 task job step with a |
| JobID of 128 and step id of 0 are included below: |
| |
| .TP 15 |
| job%J.out |
| job128.0.out |
| .IP |
| |
| .TP |
| job%4j.out |
| job0128.out |
| .IP |
| |
| .TP |
| job%2j\-%2t.out |
| job128\-00.out, job128\-01.out, ... |
| .IP |
| |
| .SH "PERFORMANCE" |
| .PP |
| Executing \fBsbatch\fR sends a remote procedure call to \fBslurmctld\fR. If |
| enough calls from \fBsbatch\fR or other Slurm client commands that send remote |
| procedure calls to the \fBslurmctld\fR daemon come in at once, it can result in |
| a degradation of performance of the \fBslurmctld\fR daemon, possibly resulting |
| in a denial of service. |
| .PP |
| Do not run \fBsbatch\fR or other Slurm client commands that send remote |
| procedure calls to \fBslurmctld\fR from loops in shell scripts or other |
| programs. Ensure that programs limit calls to \fBsbatch\fR to the minimum |
| necessary for the information you are trying to gather. |
| |
| .SH "INPUT ENVIRONMENT VARIABLES" |
| .PP |
| Upon startup, sbatch will read and handle the options set in the following |
| environment variables. The majority of these variables are set the same way |
| the options are set, as defined above. For flag options that are defined to |
| expect no argument, the option can be enabled by setting the environment |
| variable without a value (empty or NULL string), the string 'yes', or a |
| non-zero number. Any other value for the environment variable will result in |
| the option not being set. |
| There are a couple exceptions to these rules that are noted below. |
| .br |
| \fBNOTE\fR: Environment variables will override any options set in a batch |
| script, and command line options will override any environment variables. |
| |
| .TP 22 |
| \fBSBATCH_ACCOUNT\fR |
| Same as \fB\-A, \-\-account\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_ACCTG_FREQ\fR |
| Same as \fB\-\-acctg\-freq\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_ARRAY_INX\fR |
| Same as \fB\-a, \-\-array\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_BATCH\fR |
| Same as \fB\-\-batch\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_CLUSTERS\fR or \fBSLURM_CLUSTERS\fR |
| Same as \fB\-\-clusters\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_CONSTRAINT\fR |
| Same as \fB\-C\fR, \fB\-\-constraint\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_CONTAINER\fR |
| Same as \fB\-\-container\fR. |
| .IP |
| |
| .TP |
| \fBSBATCH_CONTAINER_ID\fR |
| Same as \fB\-\-container-id\fR. |
| .IP |
| |
| .TP |
| \fBSBATCH_CORE_SPEC\fR |
| Same as \fB\-\-core\-spec\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_CPUS_PER_GPU\fR |
| Same as \fB\-\-cpus\-per\-gpu\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_DEBUG\fR |
| Same as \fB\-v, \-\-verbose\fR, when set to 1, when set to 2 gives -vv, etc. |
| .IP |
| |
| .TP |
| \fBSBATCH_DELAY_BOOT\fR |
| Same as \fB\-\-delay\-boot\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_DISTRIBUTION\fR |
| Same as \fB\-m, \-\-distribution\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_ERROR\fR |
| Same as \fB-e, \-\-error\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_EXCLUSIVE\fR |
| Same as \fB\-\-exclusive\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_EXPORT\fR |
| Same as \fB\-\-export\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_GET_USER_ENV\fR |
| Same as \fB\-\-get\-user\-env\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_GPU_BIND\fR |
| Same as \fB\-\-gpu\-bind\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_GPU_FREQ\fR |
| Same as \fB\-\-gpu\-freq\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_GPUS\fR |
| Same as \fB\-G, \-\-gpus\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_GPUS_PER_NODE\fR |
| Same as \fB\-\-gpus\-per\-node\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_GPUS_PER_TASK\fR |
| Same as \fB\-\-gpus\-per\-task\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_GRES\fR |
| Same as \fB\-\-gres\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_GRES_FLAGS\fR |
| Same as \fB\-\-gres\-flags\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_HINT\fR or \fBSLURM_HINT\fR |
| Same as \fB\-\-hint\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_IGNORE_PBS\fR |
| Same as \fB\-\-ignore\-pbs\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_INPUT\fR |
| Same as \fB\-i, \-\-input\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_JOB_NAME\fR |
| Same as \fB\-J, \-\-job\-name\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_MEM_BIND\fR |
| Same as \fB\-\-mem\-bind\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_MEM_PER_CPU\fR |
| Same as \fB\-\-mem\-per\-cpu\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_MEM_PER_GPU\fR |
| Same as \fB\-\-mem\-per\-gpu\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_MEM_PER_NODE\fR |
| Same as \fB\-\-mem\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_NETWORK\fR |
| Same as \fB\-\-network\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_NO_KILL\fR |
| Same as \fB\-k\fR, \fB\-\-no\-kill\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_NO_REQUEUE\fR |
| Same as \fB\-\-no\-requeue\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_OPEN_MODE\fR |
| Same as \fB\-\-open\-mode\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_OUTPUT\fR |
| Same as \fB-o, \-\-output\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_OVERCOMMIT\fR |
| Same as \fB\-O, \-\-overcommit\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_PARTITION\fR |
| Same as \fB\-p, \-\-partition\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_POWER\fR |
| Same as \fB\-\-power\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_PROFILE\fR |
| Same as \fB\-\-profile\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_QOS\fR |
| Same as \fB\-\-qos\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_REQ_SWITCH\fR |
| When a tree topology is used, this defines the maximum count of switches |
| desired for the job allocation and optionally the maximum time to wait |
| for that number of switches. See \fB\-\-switches\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_REQUEUE\fR |
| Same as \fB\-\-requeue\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_RESERVATION\fR |
| Same as \fB\-\-reservation\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_SEGMENT_SIZE\fR |
| Same as \fB\-\-segment\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_SIGNAL\fR |
| Same as \fB\-\-signal\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_SPREAD_JOB\fR |
| Same as \fB\-\-spread\-job\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_THREAD_SPEC\fR |
| Same as \fB\-\-thread\-spec\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_THREADS_PER_CORE\fR |
| Same as \fB\-\-threads\-per\-core\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_TIMELIMIT\fR |
| Same as \fB\-t, \-\-time\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_TRES_BIND\fR |
| Same as \fB\-\-tres\-bind\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_TRES_PER_TASK\fR |
| Same as \fB\-\-tres\-per\-task\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_USE_MIN_NODES\fR |
| Same as \fB\-\-use\-min\-nodes\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_WAIT\fR |
| Same as \fB\-W\fR, \fB\-\-wait\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_WAIT_ALL_NODES\fR |
| Same as \fB\-\-wait\-all\-nodes\fR. Must be set to 0 or 1 to disable or enable |
| the option. |
| .IP |
| |
| .TP |
| \fBSBATCH_WAIT4SWITCH\fR |
| Max time waiting for requested switches. See \fB\-\-switches\fR |
| .IP |
| |
| .TP |
| \fBSBATCH_WCKEY\fR |
| Same as \fB\-\-wckey\fR |
| .IP |
| |
| .TP |
| \fBSLURM_CONF\fR |
| The location of the Slurm configuration file. |
| .IP |
| |
| .TP |
| \fBSLURM_DEBUG_FLAGS\fR |
| Specify debug flags for sbatch to use. See DebugFlags in the |
| \fBslurm.conf\fR(5) man page for a full list of flags. The environment |
| variable takes precedence over the setting in the slurm.conf. |
| .IP |
| |
| .TP |
| \fBSLURM_EXIT_ERROR\fR |
| Specifies the exit code generated when a Slurm error occurs |
| (e.g. invalid options). |
| This can be used by a script to distinguish application exit codes from |
| various Slurm error conditions. |
| .IP |
| |
| .TP |
| \fBSLURM_STEP_KILLED_MSG_NODE_ID\fR=ID |
| If set, only the specified node will log when the job or step are killed |
| by a signal. |
| .IP |
| |
| .TP |
| \fBSLURM_UMASK\fR |
| If defined, Slurm will use the defined \fIumask\fR to set permissions when |
| creating the output/error files for the job. |
| .IP |
| |
| .SH "OUTPUT ENVIRONMENT VARIABLES" |
| .PP |
| The Slurm controller will set the following variables in the environment of |
| the batch script. |
| |
| .TP |
| \fBSBATCH_MEM_BIND\fR |
| Set to value of the \fB\-\-mem\-bind\fR option. |
| .IP |
| |
| .TP |
| \fBSBATCH_MEM_BIND_LIST\fR |
| Set to bit mask used for memory binding. |
| .IP |
| |
| .TP |
| \fBSBATCH_MEM_BIND_PREFER\fR |
| Set to "prefer" if the \fB\-\-mem\-bind\fR option includes the prefer option. |
| .IP |
| |
| .TP |
| \fBSBATCH_MEM_BIND_TYPE\fR |
| Set to the memory binding type specified with the \fB\-\-mem\-bind\fR option. |
| Possible values are "none", "rank", "map_map", "mask_mem" and "local". |
| .IP |
| |
| .TP |
| \fBSBATCH_MEM_BIND_VERBOSE\fR |
| Set to "verbose" if the \fB\-\-mem\-bind\fR option includes the verbose option. |
| Set to "quiet" otherwise. |
| .IP |
| |
| .TP |
| \fBSLURM_*_HET_GROUP_#\fR |
| For a heterogeneous job allocation, the environment variables are set separately |
| for each component. |
| .IP |
| |
| .TP |
| \fBSLURM_ARRAY_JOB_ID\fR |
| Job array's master job ID number. |
| .IP |
| |
| .TP |
| \fBSLURM_ARRAY_TASK_COUNT\fR |
| Total number of tasks in a job array. |
| .IP |
| |
| .TP |
| \fBSLURM_ARRAY_TASK_ID\fR |
| Job array ID (index) number. |
| .IP |
| |
| .TP |
| \fBSLURM_ARRAY_TASK_MAX\fR |
| Job array's maximum ID (index) number. |
| .IP |
| |
| .TP |
| \fBSLURM_ARRAY_TASK_MIN\fR |
| Job array's minimum ID (index) number. |
| .IP |
| |
| .TP |
| \fBSLURM_ARRAY_TASK_STEP\fR |
| Job array's index step size. |
| .IP |
| |
| .TP |
| \fBSLURM_CLUSTER_NAME\fR |
| Name of the cluster on which the job is executing. |
| .IP |
| |
| .TP |
| \fBSLURM_CPUS_ON_NODE\fR |
| Number of CPUs allocated to the batch step. |
| \fBNOTE\fR: The \fBselect/linear\fR plugin allocates entire nodes to |
| jobs, so the value indicates the total count of CPUs on the node. |
| For the \fBcons/tres\fR plugin, this number |
| indicates the number of CPUs on this node allocated to the step. |
| .IP |
| |
| .TP |
| \fBSLURM_CPUS_PER_GPU\fR |
| Number of CPUs requested per allocated GPU. |
| Only set if the \fB\-\-cpus\-per\-gpu\fR option is specified. |
| .IP |
| |
| .TP |
| \fBSLURM_CPUS_PER_TASK\fR |
| Number of cpus requested per task. |
| Only set if either the \fB\-\-cpus\-per\-task\fR option or the |
| \fB\-\-tres\-per\-task=cpu=#\fR option is specified. |
| .IP |
| |
| .TP |
| \fBSLURM_CONTAINER\fR |
| OCI Bundle for job. |
| Only set if \-\-container\fR is specified. |
| .IP |
| |
| .TP |
| \fBSLURM_CONTAINER_ID\fR |
| OCI id for job. |
| Only set if \fB\-\-container-id\fR is specified. |
| .IP |
| |
| .TP |
| \fBSLURM_DIST_PLANESIZE\fR |
| Plane distribution size. Only set for plane distributions. |
| See \fB\-m, \-\-distribution\fR. |
| .IP |
| |
| .TP |
| \fBSLURM_DISTRIBUTION\fR |
| Same as \fB\-m, \-\-distribution\fR |
| .IP |
| |
| .TP |
| \fBSLURM_EXPORT_ENV\fR |
| Same as \fB\-\-export\fR. |
| .IP |
| |
| .TP |
| \fBSLURM_GPU_BIND\fR |
| Requested binding of tasks to GPU. |
| Only set if the \fB\-\-gpu\-bind\fR option is specified. |
| .IP |
| |
| .TP |
| \fBSLURM_GPU_FREQ\fR |
| Requested GPU frequency. |
| Only set if the \fB\-\-gpu\-freq\fR option is specified. |
| .IP |
| |
| .TP |
| \fBSLURM_GPUS\fR |
| Number of GPUs requested. |
| Only set if the \fB\-G, \-\-gpus\fR option is specified. |
| .IP |
| |
| .TP |
| \fBSLURM_GPUS_ON_NODE\fR |
| Number of GPUs allocated to the batch step. |
| .IP |
| |
| .TP |
| \fBSLURM_GPUS_PER_NODE\fR |
| Requested GPU count per allocated node. |
| Only set if the \fB\-\-gpus\-per\-node\fR option is specified. |
| .IP |
| |
| .TP |
| \fBSLURM_GPUS_PER_SOCKET\fR |
| Requested GPU count per allocated socket. |
| Only set if the \fB\-\-gpus\-per\-socket\fR option is specified. |
| .IP |
| |
| .TP |
| \fBSLURM_GTIDS\fR |
| Global task IDs running on this node. Zero origin and comma separated. |
| It is read internally by pmi if Slurm was built with pmi support. Leaving |
| the variable set may cause problems when using external packages from |
| within the job (Abaqus and Ansys have been known to have problems when |
| it is set \- consult the appropriate documentation for 3rd party software). |
| .IP |
| |
| .TP |
| \fBSLURM_HET_SIZE\fR |
| Set to count of components in heterogeneous job. |
| .IP |
| |
| .TP |
| \fBSLURM_JOB_ACCOUNT\fR |
| Account name associated of the job allocation. |
| .IP |
| |
| .TP |
| \fBSLURM_JOB_CPUS_PER_NODE\fR |
| Count of CPUs available to the job on the nodes in the allocation, using the |
| format \fICPU_count\fR[(x\fInumber_of_nodes\fR)][,\fICPU_count\fR |
| [(x\fInumber_of_nodes\fR)] ...]. |
| For example: SLURM_JOB_CPUS_PER_NODE='72(x2),36' indicates that on the |
| first and second nodes (as listed by SLURM_JOB_NODELIST) the allocation |
| has 72 CPUs, while the third node has 36 CPUs. |
| \fBNOTE\fR: The \fBselect/linear\fR plugin allocates entire nodes to jobs, so |
| the value indicates the total count of CPUs on allocated nodes. The |
| \fBselect/cons_tres\fR plugin allocates individual |
| CPUs to jobs, so this number indicates the number of CPUs allocated to the job. |
| .IP |
| |
| .TP |
| \fBSLURM_JOB_DEPENDENCY\fR |
| Set to value of the \fB\-\-dependency\fR option. |
| .IP |
| |
| .TP |
| \fBSLURM_JOB_END_TIME\fR |
| The UNIX timestamp for a job's projected end time. |
| .IP |
| |
| .TP |
| \fBSLURM_JOB_GPUS\fR |
| The global GPU IDs of the GPUs allocated to this job. The GPU IDs are not |
| relative to any device cgroup, even if devices are constrained with task/cgroup. |
| Only set in batch and interactive jobs. |
| .IP |
| |
| .TP |
| \fBSLURM_JOB_ID\fR |
| The ID of the job allocation. |
| .IP |
| |
| .TP |
| \fBSLURM_JOB_LICENSES\fR |
| Name and count of any license(s) requested. |
| .IP |
| |
| .TP |
| \fBSLURM_JOB_NAME\fR |
| Name of the job. |
| .IP |
| |
| .TP |
| \fBSLURM_JOB_NODELIST\fR |
| List of nodes allocated to the job. |
| .IP |
| |
| .TP |
| \fBSLURM_JOB_NUM_NODES\fR |
| Total number of nodes in the job's resource allocation. |
| .IP |
| |
| .TP |
| \fBSLURM_JOB_PARTITION\fR |
| Name of the partition in which the job is running. |
| .IP |
| |
| .TP |
| \fBSLURM_JOB_QOS\fR |
| Quality Of Service (QOS) of the job allocation. |
| .IP |
| |
| .TP |
| \fBSLURM_JOB_RESERVATION\fR |
| Advanced reservation containing the job allocation, if any. |
| .IP |
| |
| .TP |
| \fBSLURM_JOB_SEGMENT_SIZE\fR |
| The size of the segments that was used to create the job allocation. |
| Only set if \-\-segment\fR is specified. |
| .IP |
| |
| .TP |
| \fBSLURM_JOB_START_TIME\fR |
| The UNIX timestamp for a job's start time. |
| .IP |
| |
| .TP |
| \fBSLURM_JOBID\fR |
| The ID of the job allocation. See \fBSLURM_JOB_ID\fR. Included for backwards |
| compatibility. |
| .IP |
| |
| .TP |
| \fBSLURM_LOCALID\fR |
| Node local task ID for the process within a job. |
| .IP |
| |
| .TP |
| \fBSLURM_MEM_PER_CPU\fR |
| Same as \fB\-\-mem\-per\-cpu\fR |
| .IP |
| |
| .TP |
| \fBSLURM_MEM_PER_GPU\fR |
| Requested memory per allocated GPU. |
| Only set if the \fB\-\-mem\-per\-gpu\fR option is specified. |
| .IP |
| |
| .TP |
| \fBSLURM_MEM_PER_NODE\fR |
| Same as \fB\-\-mem\fR |
| .IP |
| |
| .TP |
| \fBSLURM_NETWORK\fR |
| Set to the value of the \fB\-\-network\fR option, if specified. |
| .IP |
| |
| .TP |
| \fBSLURM_NNODES\fR |
| Total number of nodes in the job's resource allocation. See |
| \fBSLURM_JOB_NUM_NODES\fR. Included for backwards compatibility. |
| .IP |
| |
| .TP |
| \fBSLURM_NODEID\fR |
| ID of the nodes allocated. |
| .IP |
| |
| .TP |
| \fBSLURM_NODELIST\fR |
| List of nodes allocated to the job. See \fBSLURM_JOB_NODELIST\fR. Included |
| for backwards compatibility. |
| .IP |
| |
| .TP |
| \fBSLURM_NPROCS\fR |
| Same as \fBSLURM_NTASKS\fR. Included for backwards compatibility. |
| .IP |
| |
| .TP |
| \fBSLURM_NTASKS\fR |
| Set to value of the \fB\-\-ntasks\fR option, if specified. Or, if any of the |
| \fB\-\-ntasks\-per\-*\fR options are specified, set to the number of tasks in |
| the job. |
| |
| \fBNOTE\fR: This is also an input variable for srun, so if set it will |
| effectively set the \fB\-\-ntasks\fR option for srun when called from the batch |
| script. |
| .IP |
| |
| .TP |
| \fBSLURM_NTASKS_PER_CORE\fR |
| Number of tasks requested per core. |
| Only set if the \fB\-\-ntasks\-per\-core\fR option is specified. |
| |
| .IP |
| |
| .TP |
| \fBSLURM_NTASKS_PER_GPU\fR |
| Number of tasks requested per GPU. |
| Only set if the \fB\-\-ntasks\-per\-gpu\fR option is specified. |
| .IP |
| |
| .TP |
| \fBSLURM_NTASKS_PER_NODE\fR |
| Number of tasks requested per node. |
| Only set if the \fB\-\-ntasks\-per\-node\fR option is specified. |
| .IP |
| |
| .TP |
| \fBSLURM_NTASKS_PER_SOCKET\fR |
| Number of tasks requested per socket. |
| Only set if the \fB\-\-ntasks\-per\-socket\fR option is specified. |
| .IP |
| |
| .TP |
| \fBSLURM_OOMKILLSTEP\fR |
| Same as \fB\-\-oom\-kill\-step\fR |
| .IP |
| |
| .TP |
| \fBSLURM_OVERCOMMIT\fR |
| Set to \fB1\fR if \fB\-\-overcommit\fR was specified. |
| .IP |
| |
| .TP |
| \fBSLURM_PRIO_PROCESS\fR |
| The scheduling priority (nice value) at the time of job submission. |
| This value is propagated to the spawned processes. |
| .IP |
| |
| .TP |
| \fBSLURM_PROCID\fR |
| The MPI rank (or relative process ID) of the current process |
| .IP |
| |
| .TP |
| \fBSLURM_PROFILE\fR |
| Same as \fB\-\-profile\fR |
| .IP |
| |
| .TP |
| \fBSLURM_RESTART_COUNT\fR |
| If the job has been restarted due to system failure or has been |
| explicitly requeued, this will be sent to the number of times |
| the job has been restarted. |
| .IP |
| |
| .TP |
| \fBSLURM_SHARDS_ON_NODE\fR |
| Number of GPU Shards available to the step on this node. |
| .IP |
| |
| .TP |
| \fBSLURM_SUBMIT_DIR\fR |
| The directory from which \fBsbatch\fR was invoked. |
| .IP |
| |
| .TP |
| \fBSLURM_SUBMIT_HOST\fR |
| The hostname of the computer from which \fBsbatch\fR was invoked. |
| .IP |
| |
| .TP |
| \fBSLURM_TASK_PID\fR |
| The process ID of the task being started. |
| .IP |
| |
| .TP |
| \fBSLURM_TASKS_PER_NODE\fR |
| Number of tasks to be initiated on each node. Values are |
| comma separated and in the same order as SLURM_JOB_NODELIST. |
| If two or more consecutive nodes are to have the same task |
| count, that count is followed by "(x#)" where "#" is the |
| repetition count. For example, "SLURM_TASKS_PER_NODE=2(x3),1" |
| indicates that the first three nodes will each execute two |
| tasks and the fourth node will execute one task. |
| .IP |
| |
| .TP |
| \fBSLURM_THREADS_PER_CORE\fR |
| This is only set if \fB\-\-threads\-per\-core\fR or |
| \fBSBATCH_THREADS_PER_CORE\fR were specified. The value will be set to the |
| value specified by \fB\-\-threads\-per\-core\fR or |
| \fBSBATCH_THREADS_PER_CORE\fR. This is used by subsequent srun calls within the |
| job allocation. |
| .IP |
| |
| .TP |
| \fBSLURM_TOPOLOGY_ADDR\fR |
| This is set only if the system has the topology/tree plugin |
| configured. The value will be set to the names network switches |
| which may be involved in the job's communications from the |
| system's top level switch down to the leaf switch and ending with |
| node name. A period is used to separate each hardware component name. |
| .IP |
| |
| .TP |
| \fBSLURM_TOPOLOGY_ADDR_PATTERN\fR |
| This is set only if the system has the topology/tree plugin |
| configured. The value will be set component types listed in |
| SLURM_TOPOLOGY_ADDR. Each component will be identified as |
| either "switch" or "node". A period is used to separate each |
| hardware component type. |
| .IP |
| |
| .TP |
| \fBSLURM_TRES_PER_TASK\fR |
| Set to the value of \fB\-\-tres\-per\-task\fR. If \fB\-\-cpus\-per\-task\fR or |
| \fB\-\-gpus\-per\-task\fR is specified, it is also set in |
| \fBSLURM_TRES_PER_TASK\fR as if it were specified in \fB\-\-tres\-per\-task\fR. |
| .IP |
| |
| .TP |
| \fBSLURMD_NODENAME\fR |
| Name of the node running the job script. |
| .IP |
| |
| .SH "EXAMPLES" |
| |
| .TP |
| Specify a batch script by filename on the command line. \ |
| The batch script specifies a 1 minute time limit for the job. |
| .IP |
| .nf |
| $ cat myscript |
| #!/bin/sh |
| #SBATCH \-\-time=1 |
| srun hostname |sort |
| |
| $ sbatch \-N4 myscript |
| salloc: Granted job allocation 65537 |
| |
| $ cat slurm\-65537.out |
| host1 |
| host2 |
| host3 |
| host4 |
| .fi |
| |
| .TP |
| Pass a batch script to sbatch on standard input: |
| .IP |
| .nf |
| $ sbatch \-N4 <<EOF |
| > #!/bin/sh |
| > srun hostname |sort |
| > EOF |
| sbatch: Submitted batch job 65541 |
| |
| $ cat slurm\-65541.out |
| host1 |
| host2 |
| host3 |
| host4 |
| .fi |
| |
| .TP |
| To create a heterogeneous job with 3 components, each allocating a unique set \ |
| of nodes: |
| .IP |
| .nf |
| $ sbatch \-w node[2\-3] : \-w node4 : \-w node[5\-7] work.bash |
| Submitted batch job 34987 |
| .fi |
| |
| .SH "COPYING" |
| Copyright (C) 2006\-2007 The Regents of the University of California. |
| Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER). |
| .br |
| Copyright (C) 2008\-2010 Lawrence Livermore National Security. |
| .br |
| Copyright (C) 2010\-2022 SchedMD LLC. |
| .LP |
| This file is part of Slurm, a resource management program. |
| For details, see <https://slurm.schedmd.com/>. |
| .LP |
| Slurm is free software; you can redistribute it and/or modify it under |
| the terms of the GNU General Public License as published by the Free |
| Software Foundation; either version 2 of the License, or (at your option) |
| any later version. |
| .LP |
| Slurm is distributed in the hope that it will be useful, but WITHOUT ANY |
| WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS |
| FOR A PARTICULAR PURPOSE. See the GNU General Public License for more |
| details. |
| |
| .SH "SEE ALSO" |
| .LP |
| \fBsinfo\fR(1), \fBsattach\fR(1), \fBsalloc\fR(1), \fBsqueue\fR(1), \fBscancel\fR(1), \fBscontrol\fR(1), |
| \fBslurm.conf\fR(5), \fBsched_setaffinity\fR (2), \fBnuma\fR (3) |