| .TH strigger "1" "Slurm Commands" "March 2025" "Slurm Commands" |
| |
| .SH "NAME" |
| strigger \- Used to set, get or clear Slurm trigger information. |
| |
| .SH "SYNOPSIS" |
| \fBstrigger \-\-set\fR [\fIOPTIONS\fR...] |
| .br |
| \fBstrigger \-\-get\fR [\fIOPTIONS\fR...] |
| .br |
| \fBstrigger \-\-clear\fR [\fIOPTIONS\fR...] |
| |
| .SH "DESCRIPTION" |
| \fBstrigger\fR is used to set, get or clear Slurm trigger information. |
| Triggers include events such as a node failing, a job reaching its |
| time limit or a job terminating. |
| These events can cause actions such as the execution of an arbitrary |
| script. |
| Typical uses include notifying system administrators of node failures |
| and gracefully terminating a job when its time limit is approaching. |
| A hostlist expression for the nodelist or job ID is passed as an argument |
| to the program. |
| |
| Trigger events are not processed instantly, but a check is performed for |
| trigger events on a periodic basis (currently every 15 seconds). |
| Any trigger events which occur within that interval will be compared |
| against the trigger programs set at the end of the time interval. |
| The trigger program will be executed once for any event occurring in |
| that interval. |
| The record of those events (e.g. nodes which went DOWN in the previous |
| 15 seconds) will then be cleared. |
| The trigger program must set a new trigger before the end of the next |
| interval to ensure that no trigger events are missed OR the trigger must be |
| created with an argument of "\-\-flags=PERM". |
| If desired, multiple trigger programs can be set for the same event. |
| |
| \fBNOTE\fR: This command can only set triggers if run by the |
| user \fISlurmUser\fR unless \fISlurmUser\fR is configured as user root. |
| This is required for the \fIslurmctld\fR daemon to set the appropriate |
| user and group IDs for the executed program. |
| Also note that the trigger program is executed on the same node that the |
| \fIslurmctld\fR daemon uses rather than some allocated compute node. |
| To check the value of \fISlurmUser\fR, run the command: |
| .IP |
| .nf |
| scontrol show config | grep SlurmUser |
| .fi |
| |
| .SH "ARGUMENTS" |
| |
| .TP |
| \fB\-C\fR, \fB\-\-backup_slurmctld_assumed_control\fR |
| Trigger event when backup slurmctld assumes control. |
| .IP |
| |
| .TP |
| \fB\-B\fR, \fB\-\-backup_slurmctld_failure\fR |
| Trigger an event when the backup slurmctld fails. |
| .IP |
| |
| .TP |
| \fB\-c\fR, \fB\-\-backup_slurmctld_resumed_operation\fR |
| Trigger an event when the backup slurmctld resumes operation after failure. |
| .IP |
| |
| .TP |
| \fB\-\-burst_buffer\fR |
| Trigger event when burst buffer error occurs. |
| .IP |
| |
| .TP |
| \fB\-\-clear\fP |
| Clear or delete a previously defined event trigger. |
| The \fB\-\-id\fR, \fB\-\-jobid\fR or \fB\-\-user\fR |
| option must be specified to identify the trigger(s) to |
| be cleared. |
| Only user root or the trigger's creator can delete a trigger. |
| .IP |
| |
| .TP |
| \fB\-M\fR, \fB\-\-clusters\fR=<\fIstring\fR> |
| Clusters to issue commands to. |
| Note that the \fBslurmdbd\fR must be up for this option to work properly, unless |
| running in a federation with \fBFederationParameters=fed_display\fR configured. |
| .IP |
| |
| .TP |
| \fB\-d\fR, \fB\-\-down\fR |
| Trigger an event if the specified node goes into a DOWN state. |
| .IP |
| |
| .TP |
| \fB\-D\fR, \fB\-\-drained\fR |
| Trigger an event if the specified node goes into a DRAINED state. |
| .IP |
| |
| .TP |
| \fB\-\-draining\fR |
| Trigger an event if the specified node goes into a DRAINING state, |
| before it is DRAINED. |
| .IP |
| |
| .TP |
| \fB\-F\fR, \fB\-\-fail\fR |
| Trigger an event if the specified node goes into a FAILING state. |
| .IP |
| |
| .TP |
| \fB\-f\fR, \fB\-\-fini\fR |
| Trigger an event when the specified job completes execution. |
| .IP |
| |
| .TP |
| \fB\-\-flags\fR=<\fIflag\fR> |
| Associate flags with the reservation. Multiple flags should be comma separated. |
| Valid flags include: |
| .IP |
| .RS |
| .TP |
| \fBPERM\fR |
| Make the trigger permanent. Do not purge it after the event occurs. |
| .RE |
| .IP |
| |
| .TP |
| \fB\-\-get\fP |
| Show registered event triggers. |
| Options can be used for filtering purposes. |
| .IP |
| |
| .TP |
| \fB\-i\fR, \fB\-\-id\fR=<\fIid\fR> |
| Trigger ID number. |
| .IP |
| |
| .TP |
| \fB\-I\fR, \fB\-\-idle\fR |
| Trigger an event if the specified node remains in an IDLE state |
| for at least the time period specified by the \fB\-\-offset\fR |
| option. This can be useful to hibernate a node that remains idle, |
| thus reducing power consumption. |
| .IP |
| |
| .TP |
| \fB\-j\fR, \fB\-\-jobid\fR=<\fIid\fR> |
| Job ID of interest. |
| \fBNOTE\fR: The \fB\-\-jobid\fR option can not be used in conjunction |
| with the \fB\-\-node\fR option. When the \fB\-\-jobid\fR option is |
| used in conjunction with the \fB\-\-up\fR or \fB\-\-down\fR option, |
| all nodes allocated to that job will considered the nodes used as a |
| trigger event. |
| .IP |
| |
| .TP |
| \fB\-n\fR, \fB\-\-node\fR[=\fIhost\fR] |
| Host name(s) of interest. |
| By default, all nodes associated with the job (if \fB\-\-jobid\fR |
| is specified) or on the system are considered for event triggers. |
| \fBNOTE\fR: The \fB\-\-node\fR option can not be used in conjunction |
| with the \fB\-\-jobid\fR option. When the \fB\-\-jobid\fR option is |
| used in conjunction with the \fB\-\-up\fR, \fB\-\-down\fR or |
| \fB\-\-drained\fR option, |
| all nodes allocated to that job will considered the nodes used as a |
| trigger event. Since this option's argument is optional, for proper |
| parsing the single letter option must be followed immediately with |
| the value and not include a space between them. For example "\-ntux" |
| and not "\-n tux". |
| .IP |
| |
| .TP |
| \fB\-N\fR, \fB\-\-noheader\fR |
| Do not print the header when displaying a list of triggers. |
| .IP |
| |
| .TP |
| \fB\-o\fR, \fB\-\-offset\fR=<\fIseconds\fR> |
| The specified action should follow the event by this time interval. |
| Specify a negative value if action should preceded the event. |
| The default value is zero if no \fB\-\-offset\fR option is specified. |
| The resolution of this time is about 20 seconds, so to execute |
| a script not less than five minutes prior to a job reaching its |
| time limit, specify \fB\-\-offset=320\fR (5 minutes plus 20 seconds). |
| .IP |
| |
| .TP |
| \fB\-h\fR, \fB\-\-primary_database_failure\fR |
| Trigger an event when the primary database fails. This event is triggered when |
| the accounting plugin tries to open a connection with mysql and it fails and |
| the slurmctld needs the database for some operations. |
| .IP |
| |
| .TP |
| \fB\-H\fR, \fB\-\-primary_database_resumed_operation\fR |
| Trigger an event when the primary database resumes operation after failure. |
| It happens when the connection to mysql from the accounting plugin is restored. |
| .IP |
| |
| .TP |
| \fB\-g\fR, \fB\-\-primary_slurmdbd_failure\fR |
| Trigger an event when the primary slurmdbd fails. The trigger is launched by |
| slurmctld in the occasions it tries to connect to slurmdbd, but receives no |
| response on the socket. |
| .IP |
| |
| .TP |
| \fB\-G\fR, \fB\-\-primary_slurmdbd_resumed_operation\fR |
| Trigger an event when the primary slurmdbd resumes operation after failure. |
| This event is triggered when opening the connection from slurmctld to slurmdbd |
| results in a response. It can happen also in different situations, periodically |
| every 15 seconds when checking the connection status, when saving state, |
| when agent queue is filling, and so on. |
| .IP |
| |
| .TP |
| \fB\-e\fR, \fB\-\-primary_slurmctld_acct_buffer_full\fR |
| Trigger an event when primary slurmctld accounting buffer is full. |
| .IP |
| |
| .TP |
| \fB\-a\fR, \fB\-\-primary_slurmctld_failure\fR |
| Trigger an event when the primary slurmctld fails. |
| .IP |
| |
| .TP |
| \fB\-b\fR, \fB\-\-primary_slurmctld_resumed_control\fR |
| Trigger an event when primary slurmctld resumes control. |
| .IP |
| |
| .TP |
| \fB\-A\fR, \fB\-\-primary_slurmctld_resumed_operation\fR |
| Trigger an event when the primary slurmctld resuming operation after failure. |
| .IP |
| |
| .TP |
| \fB\-p\fR, \fB\-\-program\fR=<\fIpath\fR> |
| Execute the program at the specified fully qualified pathname |
| when the event occurs. |
| You may quote the path and include extra program arguments if desired. |
| The program will be executed as the user who sets the trigger. |
| If the program fails to terminate within 5 minutes, it will |
| be killed along with any spawned processes. |
| .IP |
| |
| .TP |
| \fB\-Q\fR, \fB\-\-quiet\fR |
| Do not report non\-fatal errors. |
| This can be useful to clear triggers which may have already been purged. |
| .IP |
| |
| .TP |
| \fB\-r\fR, \fB\-\-reconfig\fR |
| Trigger an event when the system configuration changes. |
| This is triggered when the slurmctld daemon reads its configuration file or |
| when a node state changes. |
| .IP |
| |
| .TP |
| \fB\-R\fR, \fB\-\-resume\fR |
| Trigger an event if the specified node is set to the RESUME state. |
| .IP |
| |
| .TP |
| \fB\-\-set\fP |
| Register an event trigger based upon the supplied options. |
| \fBNOTE\fR: An event is only triggered once. A new event trigger |
| must be set established for future events of the same type |
| to be processed. |
| Triggers can only be set if the command is run by the user |
| \fISlurmUser\fR unless \fISlurmUser\fR is configured as user root. |
| .IP |
| |
| .TP |
| \fB\-t\fR, \fB\-\-time\fR |
| Trigger an event when the specified job's time limit is reached. |
| This must be used in conjunction with the \fB\-\-jobid\fR option. |
| .IP |
| |
| .TP |
| \fB\-u\fR, \fB\-\-up\fR |
| Trigger an event if the specified node is returned to service |
| from a DOWN state. |
| .IP |
| |
| .TP |
| \fB\-\-user\fR=<\fIuser_name_or_id\fR> |
| Clear or get triggers created by the specified user. |
| For example, a trigger created by user \fIroot\fR for a job created by user |
| \fIadam\fR could be cleared with an option \fI\-\-user=root\fR. |
| Specify either a user name or user ID. |
| .IP |
| |
| .TP |
| \fB\-v\fR, \fB\-\-verbose\fR |
| Print detailed event logging. This includes time\-stamps on data structures, |
| record counts, etc. |
| .IP |
| |
| .TP |
| \fB\-V\fR , \fB\-\-version\fR |
| Print version information and exit. |
| .IP |
| |
| .SH "OUTPUT FIELD DESCRIPTIONS" |
| |
| .TP |
| \fBTRIG_ID\fP |
| Trigger ID number. |
| .IP |
| |
| .TP |
| \fBRES_TYPE\fP |
| Resource type: \fIjob\fR or \fInode\fR |
| .IP |
| |
| .TP |
| \fBRES_ID\fP |
| Resource ID: job ID or host names or "*" for any host |
| .IP |
| |
| .TP |
| \fBTYPE\fP |
| Trigger type: \fItime\fR or \fIfini\fR (for jobs only), |
| \fIdown\fR or \fIup\fR (for jobs or nodes), or |
| \fIdrained\fR, \fIidle\fR or \fIreconfig\fR (for nodes only) |
| .IP |
| |
| .TP |
| \fBOFFSET\fP |
| Time offset in seconds. Negative numbers indicated the action should |
| occur before the event (if possible) |
| .IP |
| |
| .TP |
| \fBUSER\fP |
| Name of the user requesting the action |
| .IP |
| |
| .TP |
| \fBPROGRAM\fP |
| Pathname of the program to execute when the event occurs |
| .IP |
| |
| .SH "PERFORMANCE" |
| .PP |
| Executing \fBstrigger\fR sends a remote procedure call to \fBslurmctld\fR. If |
| enough calls from \fBstrigger\fR or other Slurm client commands that send remote |
| procedure calls to the \fBslurmctld\fR daemon come in at once, it can result in |
| a degradation of performance of the \fBslurmctld\fR daemon, possibly resulting |
| in a denial of service. |
| .PP |
| Do not run \fBstrigger\fR or other Slurm client commands that send remote |
| procedure calls to \fBslurmctld\fR from loops in shell scripts or other |
| programs. Ensure that programs limit calls to \fBstrigger\fR to the minimum |
| necessary for the information you are trying to gather. |
| |
| .SH "ENVIRONMENT VARIABLES" |
| .PP |
| Some \fBstrigger\fR options may be set via environment variables. These |
| environment variables, along with their corresponding options, are listed below. |
| (Note: Command line options will always override these settings.) |
| |
| .TP 20 |
| \fBSLURM_CONF\fR |
| The location of the Slurm configuration file. |
| .IP |
| |
| .TP |
| \fBSLURM_DEBUG_FLAGS\fR |
| Specify debug flags for strigger to use. See DebugFlags in the |
| \fBslurm.conf\fR(5) man page for a full list of flags. The environment |
| variable takes precedence over the setting in the slurm.conf. |
| .IP |
| |
| .SH "EXAMPLES" |
| |
| .TP |
| Execute the program "/usr/sbin/primary_slurmctld_failure" whenever the \ |
| primary slurmctld fails. |
| .IP |
| .nf |
| $ cat /usr/sbin/primary_slurmctld_failure |
| #!/bin/bash |
| # Submit trigger for next primary slurmctld failure event |
| strigger \-\-set \-\-primary_slurmctld_failure \\ |
| \-\-program=/usr/sbin/primary_slurmctld_failure |
| # Notify the administrator of the failure using e\-mail |
| /bin/mail slurm_admin@site.com \-s Primary_SLURMCTLD_FAILURE |
| |
| $ strigger \-\-set \-\-primary_slurmctld_failure \\ |
| \-\-program=/usr/sbin/primary_slurmctld_failure |
| .fi |
| |
| .TP |
| Execute the program "/usr/sbin/slurm_admin_notify" whenever \ |
| any node in the cluster goes down. The subject line will include \ |
| the node names which have entered the down state (passed as an \ |
| argument to the script by Slurm). |
| .IP |
| .nf |
| $ cat /usr/sbin/slurm_admin_notify |
| #!/bin/bash |
| # Submit trigger for next event |
| strigger \-\-set \-\-node \-\-down \\ |
| \-\-program=/usr/sbin/slurm_admin_notify |
| # Notify administrator using by e\-mail |
| /bin/mail slurm_admin@site.com \-s NodesDown:$* |
| |
| $ strigger \-\-set \-\-node \-\-down \\ |
| \-\-program=/usr/sbin/slurm_admin_notify |
| .fi |
| |
| .TP |
| Execute the program "/usr/sbin/slurm_suspend_node" whenever \ |
| any node in the cluster remains in the idle state for at least \ |
| 600 seconds. |
| .IP |
| .nf |
| $ strigger \-\-set \-\-node \-\-idle \-\-offset=600 \\ |
| \-\-program=/usr/sbin/slurm_suspend_node |
| .fi |
| |
| .TP |
| Execute the program "/home/joe/clean_up" when job 1234 is within \ |
| 10 minutes of reaching its time limit. |
| .IP |
| .nf |
| $ strigger \-\-set \-\-jobid=1234 \-\-time \-\-offset=\-600 \\ |
| \-\-program=/home/joe/clean_up |
| .fi |
| |
| .TP |
| Execute the program "/home/joe/node_died" when any node allocated to \ |
| job 1234 enters the DOWN state. |
| .IP |
| .nf |
| $ strigger \-\-set \-\-jobid=1234 \-\-down \\ |
| \-\-program=/home/joe/node_died |
| .fi |
| |
| .TP |
| Show all triggers associated with job 1235. |
| .IP |
| .nf |
| $ strigger \-\-get \-\-jobid=1235 |
| TRIG_ID RES_TYPE RES_ID TYPE OFFSET USER PROGRAM |
| 123 job 1235 time \-600 joe /home/bob/clean_up |
| 125 job 1235 down 0 joe /home/bob/node_died |
| .fi |
| |
| .TP |
| Delete event trigger 125. |
| .IP |
| .nf |
| $ strigger \-\-clear \-\-id=125 |
| .fi |
| |
| .TP |
| Execute /home/joe/job_fini upon completion of job 1237. |
| .IP |
| .nf |
| $ strigger \-\-set \-\-jobid=1237 \-\-fini \-\-program=/home/joe/job_fini |
| .fi |
| |
| .SH "COPYING" |
| Copyright (C) 2007 The Regents of the University of California. |
| Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER). |
| .br |
| Copyright (C) 2008\-2010 Lawrence Livermore National Security. |
| .br |
| Copyright (C) 2010\-2022 SchedMD LLC. |
| .LP |
| This file is part of Slurm, a resource management program. |
| For details, see <https://slurm.schedmd.com/>. |
| .LP |
| Slurm is free software; you can redistribute it and/or modify it under |
| the terms of the GNU General Public License as published by the Free |
| Software Foundation; either version 2 of the License, or (at your option) |
| any later version. |
| .LP |
| Slurm is distributed in the hope that it will be useful, but WITHOUT ANY |
| WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS |
| FOR A PARTICULAR PURPOSE. See the GNU General Public License for more |
| details. |
| |
| .SH "SEE ALSO" |
| \fBscontrol\fR(1), \fBsinfo\fR(1), \fBsqueue\fR(1) |