| .TH sh5util "1" "Slurm Commands" "February 2021" "Slurm Commands" |
| |
| .SH "NAME" |
| .LP |
| sh5util \- Tool for merging HDF5 files from the acct_gather_profile |
| plugin that gathers detailed data for jobs running under Slurm |
| |
| .SH "SYNOPSIS" |
| .LP |
| sh5util |
| |
| .SH "DESCRIPTION" |
| .LP |
| sh5util merges HDF5 files produced on each node for each step of a job into |
| one HDF5 file for the job. The resulting file can be viewed and manipulated |
| by common HDF5 tools such as HDF5View, h5dump, h5edit, or h5ls. |
| .LP |
| sh5util also has two extract modes. The first, writes a limited set of |
| data for specific nodes, steps, and data series in |
| "comma separated value" form to a file which can be imported into other |
| analysis tools such as spreadsheets. |
| .LP |
| The second, (Item\-Extract) extracts one data time from one time series for all |
| the samples on all the nodes from a jobs HDF5 profile. |
| .LP |
| \- Finds sample with maximum value of the item. |
| .LP |
| \- Write CSV file with min, ave, max, and item totals for each node for each |
| sample |
| |
| |
| .SH "OPTIONS" |
| .LP |
| |
| .TP |
| \fB\-E\fR, \fB\-\-extract\fR |
| |
| Extract data series from a merged job file. |
| .IP |
| .RS |
| .TP 10 |
| Extract mode options |
| .IP |
| |
| .TP |
| \fB\-i\fR, \fB\-\-input\fR=\fIpath\fR |
| merged file to extract from (default ./job_$jobid.h5) |
| .IP |
| |
| .TP |
| \fB\-N\fR, \fB\-\-node\fR=\fInodename\fR |
| Node name to extract (default is all) |
| .IP |
| |
| .TP |
| \fB\-l\fR, \fB\-\-level\fR=[Node:Totals | Node:TimeSeries] |
| Level to which series is attached. (default Node:Totals) |
| .IP |
| |
| .TP |
| \fB\-s\fR, \fB\-\-series\fR=[Energy | Filesystem | Network | Task | Task_#] |
| \fBTask\fR is all tasks, \fBTask_#\fR (# is a task id) (default is everything) |
| .RE |
| .IP |
| |
| .TP |
| \fB\-h\fR, \fB\-\-help\fR |
| Print this description of use. |
| .IP |
| |
| .TP |
| \fB\-I\fR, \fB\-\-item\-extract\fR |
| |
| Extract one data item from all samples of one data series from all nodes in a merged job file. |
| .IP |
| .RS |
| .TP 10 |
| Item\-Extract mode options |
| .IP |
| |
| .TP |
| \fB\-s\fR, \fB\-\-series\fR=[Energy | Filesystem | Network | Task]\fR |
| .IP |
| |
| .TP |
| \fB\-d\fR, \fB\-\-data\fR |
| Name of data item in series (See note below). |
| .RE |
| .IP |
| |
| .TP |
| \fB\-j\fR, \fB\-\-jobs\fR=<\fIjob\fR[.\fIstep\fR]> |
| Format is <\fIjob\fR[.\fIstep\fR]>. Merge this job/step |
| (or a comma\-separated list of job steps). This option is required. |
| Not specifying a step will result in all steps found to be processed. |
| .IP |
| |
| .TP |
| \fB\-L\fR, \fB\-\-list\fR |
| |
| Print the items of a series contained in a job file. |
| .IP |
| .RS |
| .TP 10 |
| List mode options |
| .IP |
| |
| .TP |
| \fB\-i\fR, \fB\-\-input\fR=\fIpath\fR |
| Merged file to extract from (default ./job_$jobid.h5) |
| .IP |
| |
| .TP |
| \fB\-s\fR, \fB\-\-series\fR=[Energy | Filesystem | Network | Task] |
| .RE |
| .IP |
| |
| .TP |
| \fB\-o\fR, \fB\-\-output\fR=<\fIpath\fR> |
| Path to a file into which to write. |
| Default for merge is ./job_$jobid.h5 |
| Default for extract is ./extract_$jobid.csv |
| .IP |
| |
| .TP |
| \fB\-p\fR, \fB\-\-profiledir\fR=<\fIdir\fR> |
| Directory location where node\-step files exist default is set in |
| acct_gather.conf. |
| .IP |
| |
| .TP |
| \fB\-S\fR, \fB\-\-savefiles\fR |
| Instead of removing node\-step files after merging them into the job file, |
| keep them around. |
| .IP |
| |
| .TP |
| \fB\-\-usage\fR |
| Display brief usage message. |
| .IP |
| |
| .TP |
| \fB\-\-user\fR=<\fIuser\fR> |
| User who profiled job. |
| (Handy for root user, defaults to user running this command.) |
| .IP |
| |
| .SH "Data Items per Series" |
| |
| .TP |
| \fBEnergy\fR |
| .RS |
| Power |
| .br |
| CPU_Frequency |
| .RE |
| .IP |
| |
| .TP |
| \fBFilesystem\fR |
| .RS |
| Reads |
| .br |
| Megabytes_Read |
| .br |
| Writes |
| .br |
| Megabytes_Write |
| .RE |
| .IP |
| |
| .TP |
| \fBNetwork\fR |
| .RS |
| Packets_In |
| .br |
| Megabytes_In |
| .br |
| Packets_Out |
| .br |
| Megabytes_Out |
| .RE |
| .IP |
| |
| .TP |
| \fBTask\fR |
| .RS |
| CPU_Frequency |
| .br |
| CPU_Time |
| .br |
| CPU_Utilization |
| .br |
| RSS |
| .br |
| VM_Size |
| .br |
| Pages |
| .br |
| Read_Megabytes |
| .br |
| Write_Megabytes |
| .RE |
| .IP |
| |
| .SH "PERFORMANCE" |
| .PP |
| Executing \fBsh5util\fR sends a remote procedure call to \fBslurmctld\fR. If |
| enough calls from \fBsh5util\fR or other Slurm client commands that send remote |
| procedure calls to the \fBslurmctld\fR daemon come in at once, it can result in |
| a degradation of performance of the \fBslurmctld\fR daemon, possibly resulting |
| in a denial of service. |
| .PP |
| Do not run \fBsh5util\fR or other Slurm client commands that send remote |
| procedure calls to \fBslurmctld\fR from loops in shell scripts or other |
| programs. Ensure that programs limit calls to \fBsh5util\fR to the minimum |
| necessary for the information you are trying to gather. |
| |
| .SH "EXAMPLES" |
| |
| .TP |
| Merge node\-step files (as part of a sbatch script): |
| .IP |
| .nf |
| $ sbatch \-n1 \-d$SLURM_JOB_ID \-\-wrap="sh5util \-\-savefiles \-j $SLURM_JOB_ID" |
| .fi |
| |
| .TP |
| Extract all task data from a node: |
| .IP |
| .nf |
| $ sh5util \-j 42 \-N snowflake01 \-\-level=Node:TimeSeries \-\-series=Tasks |
| .fi |
| |
| .TP |
| Extract all energy data: |
| .IP |
| .nf |
| $ sh5util \-j 42 \-\-series=Energy \-\-data=power |
| .fi |
| |
| .SH "COPYING" |
| Copyright (C) 2013 Bull. |
| .br |
| Copyright (C) 2013\-2022 SchedMD LLC. |
| Slurm is free software; you can redistribute it and/or modify it under |
| the terms of the GNU General Public License as published by the Free |
| Software Foundation; either version 2 of the License, or (at your option) |
| any later version. |
| .LP |
| Slurm is distributed in the hope that it will be useful, but WITHOUT ANY |
| WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS |
| FOR A PARTICULAR PURPOSE. See the GNU General Public License for more |
| details. |
| |
| .SH "SEE ALSO" |
| .LP |