| <!--#include virtual="header.txt"--> |
| <!-- Copyright (C) 2013 Bull S. A. S. |
| Bull, Rue Jean Jaures, B.P.68, 78340, Les Clayes-sous-Bois. --> |
| |
| <h1>Profiling Using HDF5 User Guide</h1> |
| |
| <h2 id="contents">Contents<a class="slurm_link" href="#contents"></a></h2> |
| <ul> |
| <li><a href="#Overview">Overview</a></li> |
| <li><a href="#Administration">Administration</a></li> |
| <li><a href="#Profiling">Profiling Jobs</a></li> |
| <li><a href="#HDF5">HDF5</a></li> |
| <li><a href="#DataSeries">Data Structure</a></li> |
| </ul> |
| |
| |
| <h2 id="Overview">Overview<a class="slurm_link" ref="#Overview"></a></h2> |
| <p>The acct_gather_profile/hdf5 plugin allows Slurm to coordinate collecting |
| data on jobs it runs on a cluster that is more detailed than is practical to |
| include in its database. The data comes from periodically sampling various |
| performance data either collected by Slurm, the operating system, or |
| component software. The plugin will record the data from each source |
| as a <b>Time Series</b> and also accumulate totals for each statistic for |
| the job.</p> |
| |
| <p>Time Series are energy data collected by an acct_gather_energy plugin, |
| I/O data from a network interface collected by an acct_gather_interconnect |
| plugin, I/O data from parallel file systems such as Lustre collected by an |
| acct_gather_filesystem plugin, and task performance data such as local disk I/O, |
| cpu consumption, and memory use from a jobacct_gather plugin. |
| Data from other sources may be added in the future.</p> |
| |
| <p>The data is collected into a file on a shared file system for each step on |
| each allocated node of a job and then merged into an HDF5 file. |
| Individual files on a shared file system was chosen because it is possible |
| that the data is voluminous so solutions that pass data to the Slurm control |
| daemon via RPC may not scale to very large clusters or jobs with |
| many allocated nodes.</p> |
| |
| <h2 id="Administration">Administration |
| <a class="slurm_link" href="#Administration"></a> |
| </h2> |
| |
| <h3 id="shared">Shared File System<a class="slurm_link" href="#shared"></a></h3> |
| <div style="margin-left: 20px;"> |
| <p>The HDF5 Profile Plugin requires a common shared file system on all |
| the compute nodes. While a job is running, the plugin writes a |
| file into this file system for each step of the job on each node. When |
| the job ends, the merge process is launched and the node-step files |
| are combined into one HDF5 file for the job.</p> |
| |
| <p>The root of the directory structure is declared in the <b>ProfileHDF5Dir</b> |
| option in the acct_gather.conf file. The directory will be created by |
| Slurm if it doesn't exist. Each user will have |
| their own directory created in the ProfileHDF5Dir which contains |
| the HDF5 files. All the directories and files are created by the |
| SlurmdUser which is usually root. The user specific directories, as well |
| as the files inside, are chowned to the user running the job so they |
| can access the files. Since user root is usually creating these |
| files/directories a root squashed file system will not work for |
| the ProfileHDF5Dir.</p> |
| |
| <p>Each user that creates a profile will have a subdirectory in the profile |
| directory that has read/write permission only for the user.</p> |
| </div> |
| <h3 id="config">Configuration parameters |
| <a class="slurm_link" href="#config"></a> |
| </h3> |
| |
| <p><div style="margin-left: 20px;"> |
| <p>The profile plugin is enabled in the |
| <a href="slurm.conf.html">slurm.conf</a> file and it is internally |
| configured in the |
| <a href="acct_gather.conf.html">acct_gather.conf</a> file.</p> |
| </div> |
| <div style="margin-left: 20px;"> |
| <br> |
| <h3 id="slurm_conf">slurm.conf parameters |
| <a class="slurm_link" href="#slurm_conf"></a> |
| </h3> |
| <div style="margin-left: 20px;"> |
| <dl> |
| <dt><b>AcctGatherProfileType</b>=acct_gather_profile/hdf5</dt> |
| <dd>Enables the HDF5 plugin.</dd> |
| <dt><b>JobAcctGatherFrequency</b>=<seconds></dt> |
| <dd>Sets the sampling frequency for data types.</dd> |
| </dl> |
| </div> |
| </div> |
| <div style="margin-left: 20px;"> |
| <br> |
| <h3 id="acct_gather">acct_gather.conf parameters |
| <a class="slurm_link" href="#acct_gather"></a> |
| </h3> |
| <div style="margin-left: 20px;"> |
| <p>These parameters are directly used by the HDF5 Profile Plugin.</p> |
| <dl> |
| <dt><b>ProfileHDF5Dir</b>=<path></dt> |
| <dd>This parameter is the path to the shared folder into which the |
| acct_gather_profile plugin will write detailed data as an HDF5 file. |
| The directory is assumed to be on a file system shared by the controller and |
| all compute nodes. This is a required parameter.</dd> |
| <dt><b>ProfileHDF5Default</b>=[options]</dt> |
| <dd>A comma-delimited list of data types to be collected for each job |
| submission. Use this option with caution. A node-step file will be created on |
| every node for every step of every job. They will not automatically be merged |
| into job files. (Even job files for large numbers of small jobs would fill the |
| file system.) This option is intended for test environments where you |
| might want to profile a series of jobs but do not want to have to |
| add the --profile option to the launch scripts. |
| The options are described below and in the man pages for acct_gather.conf, |
| srun, salloc and sbatch commands.</dd> |
| </dl> |
| </div> |
| </div> |
| |
| |
| <div style="margin-left: 20px;"> |
| <br> |
| <h3 id="time">Time Series Control Parameters |
| <a class="slurm_link" href="#time"></a> |
| </h3> |
| <div style="margin-left: 20px;"> |
| <p>Other plugins add time series data to the HDF5 collection. They typically |
| have a default polling frequency specified in slurm.conf in the |
| JobAcctGatherFrequency parameter. The polling frequency can be overridden |
| using the --acctg-freq |
| <a href="srun.html">srun</a> parameter. |
| They are both of the form task=sec,energy=sec,filesystem=sec,network=sec.<p> |
| |
| <p>The IPMI energy plugin also needs the EnergyIPMIFrequency value set |
| in the acct_gather.conf file. This sets the rate at which the plugin samples |
| the external sensors. This value should be the same as the energy=sec in |
| either JobAcctGatherFrequency or --acctg-freq.</p> |
| |
| <p>Note that the IPMI and profile sampling are not synchronous. |
| The profile sample simply takes the last available IPMI sample value. |
| If the profile energy sample is more frequent than the IPMI sample rate, |
| the IPMI value will be repeated. If the profile energy sample is greater |
| than the IPMI rate, IPMI values will be lost.</p> |
| |
| <p>Also note that smallest effective IPMI (EnergyIPMIFrequency) sample rate |
| for 2013 era Intel processors is 3 seconds.</p> |
| |
| </div> |
| </div> |
| <h2 id="Profiling">Profiling Jobs |
| <a class="slurm_link" href="#Profiling"></a> |
| </h2> |
| <h3 id="collection">Data Collection |
| <a class="slurm_link" href="#collection"></a> |
| </h3> |
| <p>The --profile option on salloc|sbatch|srun controls whether data is |
| collected and what type of data is collected. If --profile is not specified |
| no data collected unless the <B>ProfileHDF5Default</B> |
| option is used in acct_gather.conf. --profile on the command line overrides |
| any value specified in the configuration file.</p> |
| |
| <div style="margin-left: 20px;"> |
| <dl> |
| <dt><b>--profile</b>=<all|none|[energy[,|task[,|filesystem[,|network]]]]> |
| </dt> |
| <dd>Enables detailed data collection by the acct_gather_profile plugin. |
| Detailed data are typically time-series that are stored in a HDF5 file for |
| the job.</dd> |
| <dd> |
| <DL> |
| <dt><B>All</B></dt> |
| <DD>All data types are collected. (Cannot be combined with other values.) |
| </DD> |
| <dt><B>None</B></dt> |
| <DD>No data types are collected. This is the default. (Cannot be |
| combined with other values.) |
| </DD> |
| |
| <dt><B>Energy</B></dt> |
| <DD>Energy data is collected.</DD> |
| |
| <dt><B>Filesystem</B></dt> |
| <DD>Filesystem data is collected. Currently only |
| Lustre filesystem is supported.</DD> |
| |
| <dt><B>Network</B></dt> |
| <DD>Network (InfiniBand) data is collected.</DD> |
| |
| <dt><B>Task</B></dt> |
| <DD>Task (I/O, Memory, ...) data is collected.</DD> |
| </DL> |
| </dd> |
| </dl> |
| </div> |
| |
| <h3 id="consolidation">Data Consolidation |
| <a class="slurm_link" href="#consolidation"></a> |
| </h3> |
| <p>The node-step files are merged into one HDF5 file for the job using the |
| <a href="sh5util.html">sh5util</a>.</p> |
| |
| <p>If the job is started with sbatch, the command line may added to the normal |
| launch script, For example:</p> |
| <pre> |
| sbatch -n1 -d$SLURM_JOB_ID --wrap="sh5util -j $SLURM_JOB_ID" |
| </pre> |
| |
| <h3 id="extraction">Data Extraction |
| <a class="slurm_link" href="#extraction"></a> |
| </h3> |
| <p>The <a href="sh5util.html">sh5util</a> program can also be used to extract |
| specific data from the HDF5 file and write it in <i>comma separated value (csv)</i> |
| form for importation into other analysis tools such as spreadsheets.</p> |
| |
| <h2 id="HDF5">HDF5<a class="slurm_link" href="#HDF5"></a></h2> |
| <p>HDF5 is a well known structured data set that allows heterogeneous but |
| related data to be stored in one file. |
| (.i.e. sections for energy statistics, network I/O, Task data, etc.) |
| Its internal structure resembles a |
| file system with <b>groups</b> being similar to <i>directories</i> and |
| <b>data sets</b> being similar to <i>files</i>. It also allows <b>attributes</b> |
| to be attached to groups to store application defined properties.</p> |
| |
| <p>There are commodity programs, notably |
| <a href="http://www.hdfgroup.org/hdf-java-html/hdfview/index.html"> |
| HDFView</a>, for viewing and manipulating these files. |
| |
| <p>Below is a screen shot from HDFView expanding the job tree and showing the |
| attributes for a specific task.</p> |
| <br> |
| <img src="hdf5_task_attr.png" width="275" height="275" > |
| |
| |
| <h2 id="DataSeries">Data Structure |
| <a class="slurm_link" href="#DataSeries"></a> |
| </h2> |
| |
| <table> |
| <tr> |
| <td><img src="hdf5_job_outline.png" width="205" height="570"></td> |
| <td style="vertical-align: top;"> |
| <div style="margin-left: 5px;"> |
| <p>In the job file, there will be a group for each <b>step</b> of the job. |
| Within each step, there will be a group for nodes, and a group for tasks.</p> |
| </div> |
| <ul> |
| <li> |
| The <b>nodes</b> group will have a group for each node in the step allocation. |
| For each node group, there is a sub-group for Time Series and another |
| for Totals. |
| <ul> |
| <li> |
| The <b>Time Series</b> group |
| contains a group/dataset containing the time series for each collector. |
| </li> |
| <li> |
| The <b>Totals</b> group contains a group/dataset that has corresponding |
| Minimum, Average, Maximum, and Sum Total for each item in the time series. |
| </li> |
| </ul> |
| <li> |
| The <b>Tasks</b> group will only contain a subgroup for each task. |
| It primarily contains an attribute stating the node on which the task was |
| executed. This set of groups is essentially a cross reference table. |
| </li> |
| </ul> |
| </td></tr> |
| </table> |
| |
| <h3 id="energy">Energy Data<a class="slurm_link" href="#energy"></a></h3> |
| <p><b>AcctGatherEnergyType</b>=acct_gather_energy/ipmi<p> |
| is required in slurm.conf to collect energy data. |
| Appropriately set energy=freq in either JobAcctGatherFrequency in slurm.conf |
| or in --acctg-freq on the command line. |
| Also appropriately set EnergyIPMIFrequency in acct_gather.conf.</p> |
| <p>Each data sample in the Energy Time Series contains the following data items. |
| </p> |
| <DL> |
| <dt><B>Date Time</B></dt> |
| <DD>Time of day at which the data sample was taken. This can be used to |
| correlate activity with other sources such as logs.</DD> |
| <dt><B>Time</B></dt> |
| <DD>Elapsed time since the beginning of the step.</DD> |
| <dt><B>Power</B></dt> |
| <DD>Power consumption during the interval.</DD> |
| <dt><B>CPU Frequency</B></dt> |
| <DD>CPU Frequency at time of sample in kilohertz.</DD> |
| </DL> |
| |
| <h3 id="filesystem">Filesystem Data |
| <a class="slurm_link" href="#filesystem"></a> |
| </h3> |
| <p><b>AcctGatherFilesystemType</b>=acct_gather_filesystem/lustre<p> |
| is required in slurm.conf to collect task data. |
| Appropriately set Filesystem=freq in either JobAcctGatherFrequency in slurm.conf |
| or in --acctg-freq on the command line.</p> |
| |
| <p>Each data sample in the Filesystem Time Series contains the following data items. |
| </p> |
| <DL> |
| <dt><B>Date Time</B></dt> |
| <DD>Time of day at which the data sample was taken. This can be used to |
| correlate activity with other sources such as logs.</DD> |
| <dt><B>Time</B></dt> |
| <DD>Elapsed time since the beginning of the step.</DD> |
| <dt><B>Reads</B></dt> |
| <DD>Number of read operations.</DD> |
| <dt><B>Megabytes Read</B></dt> |
| <DD>Number of megabytes read.</DD> |
| <dt><B>Writes</B></dt> |
| <DD>Number of write operations.</DD> |
| <dt><B>Megabytes Write</B></dt> |
| <DD>Number of megabytes written.</DD> |
| </DL> |
| |
| <h3 id="network">Network (Infiniband Data) |
| <a class="slurm_link" href="#network"></a> |
| </h3> |
| <p><b>AcctGatherInterconnectType</b>=acct_gather_interconnect/ofed<p> |
| is required in slurm.conf to collect task data. |
| Appropriately set network=freq in either JobAcctGatherFrequency in slurm.conf |
| or in --acctg-freq on the command line.</p> |
| <p>Each data sample in the Network Time Series contains the following |
| data items.</p> |
| <DL> |
| <dt><B>Date Time</B></dt> |
| <DD>Time of day at which the data sample was taken. This can be used to |
| correlate activity with other sources such as logs.</DD> |
| <dt><B>Time</B></dt> |
| <DD>Elapsed time since the beginning of the step.</DD> |
| <dt><B>Packets In</B></dt> |
| <DD>Number of packets coming in.</DD> |
| <dt><B>Megabytes Read</B></dt> |
| <DD>Number of megabytes coming in through the interface.</DD> |
| <dt><B>Packets Out</B></dt> |
| <DD>Number of packets going out.</DD> |
| <dt><B>Megabytes Write</B></dt> |
| <DD>Number of megabytes going out through the interface.</DD> |
| </DL> |
| |
| <h3 id="task">Task Data<a class="slurm_link" href="#task"></a></h3> |
| <p><b>JobAcctGatherType</b>=jobacct_gather/linux<p> |
| is required in slurm.conf to collect task data. |
| Appropriately set task=freq in either JobAcctGatherFrequency in slurm.conf |
| or in --acctg-freq on the command line.</p> |
| <p>Each data sample in the Task Time Series contains the following data |
| items.</p> |
| <DL> |
| <dt><B>Date Time</B></dt> |
| <DD>Time of day at which the data sample was taken. This can be used to |
| correlate activity with other sources such as logs.</DD> |
| <dt><B>Time</B></dt> |
| <DD>Elapsed time since the beginning of the step.</DD> |
| <dt><B>CPU Frequency</B></dt> |
| <DD>CPU Frequency at time of sample.</DD> |
| <dt><B>CPU Time</B></dt> |
| <DD>Seconds of CPU time used during the sample.</DD> |
| <dt><B>CPU Utilization</B></dt> |
| <DD>CPU Utilization during the interval.</DD> |
| <dt><B>RSS</B></dt> |
| <DD>Value of RSS at time of sample.</DD> |
| <dt><B>VM Size</B></dt> |
| <DD>Value of VM Size at time of sample.</DD> |
| <dt><B>Pages</B></dt> |
| <DD>Pages used in sample.</DD> |
| <dt><B>Read Megabytes</B></dt> |
| <DD>Number of megabytes read from local disk.</DD> |
| <dt><B>Write Megabytes</B></dt> |
| <DD>Number of megabytes written to local disk.</DD> |
| </DL> |
| |
| |
| |
| <p style="text-align:center;">Last modified 17 October 2022</p> |
| |
| <!--#include virtual="footer.txt"--> |