| <!--#include virtual="header.txt"--> |
| |
| <h1><a name="top">SLURM Job Accounting Plugin API</a></h1> |
| |
| <h2> Overview</h2> |
| <p> This document describes SLURM job accounting plugins and the API that |
| defines them. It is intended as a resource to programmers wishing to write |
| their own SLURM job accounting plugins. This is version 1 of the API. |
| |
| |
| <p>SLURM job accounting plugins must conform to the |
| SLURM Plugin API with the following specifications: |
| |
| <p><span class="commandline">const char |
| plugin_name[]="<i>full text name</i>" |
| <p style="margin-left:.2in"> |
| A free-formatted ASCII text string that identifies the plugin. |
| |
| <p><span class="commandline">const char |
| plugin_type[]="<i>major/minor</i>"</span><br> |
| <p style="margin-left:.2in"> |
| The major type must be "jobacct." |
| The minor type can be any suitable name |
| for the type of accounting package. We currently use |
| <ul> |
| <li><b>aix</b>— Gathers information from AIX /proc table and adds this |
| information to the standard rusage information also gathered for each job. |
| <li><b>linux</b>—Gathers information from Linux /proc table and adds this |
| information to the standard rusage information also gathered for each job. |
| <li><b>none</b>—No information gathered. |
| </ul> |
| The <b>sacct</b> program can be used to display gathered data from regular |
| accounting and from these plugins. |
| <p>The programmer is urged to study |
| <span class="commandline">src/plugins/jobacct/linux</span> and |
| <span class="commandline">src/plugins/jobacct/common</span> |
| for a sample implementation of a SLURM job accounting plugin. |
| <p class="footer"><a href="#top">top</a> |
| |
| |
| <h2>API Functions</h2> |
| |
| The job accounting API uses hooks in the slurmctld, slurmd, and slurmstepd. |
| |
| <p>All of the following functions are required. Functions which are not |
| implemented must be stubbed. |
| |
| <h4>Functions called by all slurmstepd processes</h4> |
| |
| <p class="commandline">int jobacct_p_startpoll(int frequency) |
| <p style="margin-left:.2in"><b>Description</b>: |
| jobacct_p_startpoll() is called at the start of the slurmstepd, |
| this starts a thread that should poll information to be queried at any time |
| during throughout the end of the process. |
| Put global initialization here. |
| <p style="margin-left:.2in"><b>Arguments</b>: |
| <span class="commandline">frequency</span> (input) poll frequency for polling |
| thread. |
| <p style="margin-left:.2in"><b>Returns</b>: |
| <span class="commandline">SLURM_SUCCESS</span> on success, or |
| <span class="commandline">SLURM_FAILURE</span> on failure. |
| |
| <p class="commandline">int jobacct_p_endpoll() |
| <p style="margin-left:.2in"><b>Description</b>: |
| jobacct_p_endpoll() is called when the process is finished to stop the |
| polling thread. |
| <p style="margin-left:.2in"><b>Arguments</b>: |
| <span class="commandline">none</span> |
| <p style="margin-left:.2in"><b>Returns</b>: |
| <span class="commandline">SLURM_SUCCESS</span> on success, or |
| <span class="commandline">SLURM_FAILURE</span> on failure. |
| |
| <p class="commandline">void jobacct_p_suspend_poll() |
| <p style="margin-left:.2in"><b>Description</b>: |
| jobacct_p_suspend_poll() is called when the process is suspended. |
| This causes the polling thread to halt until the process is resumed. |
| <p style="margin-left:.2in"><b>Arguments</b>: |
| <span class="commandline">none</span> |
| <p style="margin-left:.2in"><b>Returns</b>: |
| <span class="commandline">none</span> |
| |
| <p class="commandline">void jobacct_p_resume_poll() |
| <p style="margin-left:.2in"><b>Description</b>: |
| jobacct_p_resume_poll() is called when the process is resumed. |
| This causes the polling thread to resume operation. |
| <p style="margin-left:.2in"><b>Arguments</b>: |
| <span class="commandline">none</span> |
| <p style="margin-left:.2in"><b>Returns</b>: |
| <span class="commandline">none</span> |
| |
| |
| <p class="commandline">int jobacct_p_add_task(pid_t pid, uint16_t tid) |
| <p style="margin-left:.2in"><b>Description</b>: |
| jobacct_p_add_task() used to add a task to the poller. |
| <p style="margin-left:.2in"><b>Arguments</b>: |
| <span class="commandline"> pid</span> (input) Process id |
| <span class="commandline"> tid</span> (input) slurm global task id |
| <p style="margin-left:.2in"><b>Returns</b>: |
| <span class="commandline">SLURM_SUCCESS</span> on success, or |
| <span class="commandline">SLURM_FAILURE</span> on failure. |
| |
| <p class="commandline">jobacctinfo_t *jobacct_p_stat_task(pid_t pid) |
| <p style="margin-left:.2in"><b>Description</b>: |
| jobacct_p_stat_task() used to get most recent information about task. |
| You need to FREE the information returned by this function! |
| <p style="margin-left:.2in"><b>Arguments</b>: |
| <span class="commandline"> pid</span> (input) Process id |
| <p style="margin-left:.2in"><b>Returns</b>: |
| <span class="commandline">jobacctinfo structure pointer</span> on success, or |
| <span class="commandline">NULL</span> on failure. |
| |
| <p class="commandline">jobacctinfo_t *jobacct_p_remove_task(pid_t pid) |
| <p style="margin-left:.2in"><b>Description</b>: |
| jobacct_p_remove_task() used to remove a task from the poller. |
| You need to FREE the information returned by this function! |
| <p style="margin-left:.2in"><b>Arguments</b>: |
| <span class="commandline"> pid</span> (input) Process id |
| <p style="margin-left:.2in"><b>Returns</b>: |
| <span class="commandline">Pointer to removed jobacctinfo_t structure</span> |
| on success, or |
| <span class="commandline">NULL</span> on failure. |
| |
| <p class="footer"><a href="#top">top</a> |
| |
| <h4>Functions called by the slurmctld process</h4> |
| |
| <p class="commandline">int jobacct_p_init_slurmctld(char *job_acct_log) |
| <p style="margin-left:.2in"><b>Description</b>: |
| jobacct_p_init_slurmctld() is called at the start of the slurmctld, |
| this opens the logfile to be written to. |
| Put global initialization here. |
| <p style="margin-left:.2in"><b>Arguments</b>: |
| <span class="commandline">job_acct_log</span> (input) logfile name. |
| <p style="margin-left:.2in"><b>Returns</b>: |
| <span class="commandline">SLURM_SUCCESS</span> on success, or |
| <span class="commandline">SLURM_FAILURE</span> on failure. |
| |
| <p class="commandline">int jobacct_p_fini_slurmctld() |
| <p style="margin-left:.2in"><b>Description</b>: |
| jobacct_p_fini_slurmctld() is called at the end of the slurmctld, |
| this closes the logfile. |
| <p style="margin-left:.2in"><b>Arguments</b>: |
| <span class="commandline">none</span> |
| <p style="margin-left:.2in"><b>Returns</b>: |
| <span class="commandline">SLURM_SUCCESS</span> on success, or |
| <span class="commandline">SLURM_FAILURE</span> on failure. |
| |
| <p class="commandline"> |
| int jobacct_p_job_start_slurmctld(struct job_record *job_ptr) |
| <p style="margin-left:.2in"><b>Description</b>: |
| jobacct_p_job_start_slurmctld() is called at the allocation of a new job in |
| the slurmctld, this prints out beginning information about a job. |
| <p style="margin-left:.2in"><b>Arguments</b>: |
| <span class="commandline">job_ptr</span> (input) information about the job in |
| slurmctld. |
| <p style="margin-left:.2in"><b>Returns</b>: |
| <span class="commandline">SLURM_SUCCESS</span> on success, or |
| <span class="commandline">SLURM_FAILURE</span> on failure. |
| |
| <p class="commandline"> |
| int jobacct_p_job_complete_slurmctld(struct job_record *job_ptr) |
| <p style="margin-left:.2in"><b>Description</b>: |
| jobacct_p_job_complete_slurmctld() is called at the end of a job in |
| the slurmctld, this prints out ending information about a job. |
| <p style="margin-left:.2in"><b>Arguments</b>: |
| <span class="commandline">job_ptr</span> (input) information about the job in |
| slurmctld. |
| <p style="margin-left:.2in"><b>Returns</b>: |
| <span class="commandline">SLURM_SUCCESS</span> on success, or |
| <span class="commandline">SLURM_FAILURE</span> on failure. |
| |
| <p class="commandline"> |
| int jobacct_p_step_start_slurmctld(struct step_record *step_ptr) |
| <p style="margin-left:.2in"><b>Description</b>: |
| jobacct_p_step_start_slurmctld() is called at the allocation of a new step in |
| the slurmctld, this prints out beginning information about a step. |
| <p style="margin-left:.2in"><b>Arguments</b>: |
| <span class="commandline">step_ptr</span> (input) information about the step in |
| slurmctld. |
| <p style="margin-left:.2in"><b>Returns</b>: |
| <span class="commandline">SLURM_SUCCESS</span> on success, or |
| <span class="commandline">SLURM_FAILURE</span> on failure. |
| |
| <p class="commandline"> |
| int jobacct_p_step_complete_slurmctld(struct step_record *step_ptr) |
| <p style="margin-left:.2in"><b>Description</b>: |
| jobacct_p_step_complete_slurmctld() is called at the end of a step in |
| the slurmctld, this prints out ending information about a step. |
| <p style="margin-left:.2in"><b>Arguments</b>: |
| <span class="commandline">step_ptr</span> (input) information about the step in |
| slurmctld. |
| <p style="margin-left:.2in"><b>Returns</b>: |
| <span class="commandline">SLURM_SUCCESS</span> on success, or |
| <span class="commandline">SLURM_FAILURE</span> on failure. |
| |
| <p class="commandline"> |
| int jobacct_p_suspend_slurmctld(struct job_record *job_ptr) |
| <p style="margin-left:.2in"><b>Description</b>: |
| jobacct_p_suspend_slurmctld() is called when a job is suspended or resumed in |
| the slurmctld, this prints out information about the suspension of the job |
| to the logfile. |
| <p style="margin-left:.2in"><b>Arguments</b>: |
| <span class="commandline">job_ptr</span> (input) information about the job in |
| slurmctld. |
| <p style="margin-left:.2in"><b>Returns</b>: |
| <span class="commandline">SLURM_SUCCESS</span> on success, or |
| <span class="commandline">SLURM_FAILURE</span> on failure. |
| |
| <p class="footer"><a href="#top">top</a> |
| |
| <h4>Functions common to all processes</h4> |
| |
| <p class="commandline"> |
| int jobacct_p_init_struct(jobacctinfo_t *jobacct, uint16_t tid) |
| <p style="margin-left:.2in"><b>Description</b>: |
| jobacct_p_init_struct() is called to set the values of a jobacctinfo_t to |
| initial values. |
| <p style="margin-left:.2in"><b>Arguments</b>: |
| <span class="commandline">jobacct</span> |
| (input/output) structure to be altered. |
| <span class="commandline">tid</span> |
| (input) id of the task send in (uint16_t)NO_VAL if no specfic task. |
| <p style="margin-left:.2in"><b>Returns</b>: |
| <span class="commandline">SLURM_SUCCESS</span> on success, or |
| <span class="commandline">SLURM_FAILURE</span> on failure. |
| |
| <p class="commandline">jobacctinfo_t *jobacct_p_alloc(uint16_t tid) |
| <p style="margin-left:.2in"><b>Description</b>: |
| jobacct_p_alloc() used to alloc a pointer to and initialize a |
| new jobacctinfo structure.<br> |
| You will need to free the information returned by this function! |
| <p style="margin-left:.2in"><b>Arguments</b>: |
| <span class="commandline">tid</span> |
| (input) id of the task send in (uint16_t)NO_VAL if no specfic task. |
| <p style="margin-left:.2in"><b>Returns</b>: |
| <span class="commandline">jobacctinfo structure pointer</span> on success, or |
| <span class="commandline">NULL</span> on failure. |
| |
| <p class="commandline">void jobacct_p_free(jobacctinfo_t *jobacct) |
| <p style="margin-left:.2in"><b>Description</b>: |
| jobacct_p_free() used to free the allocation made by jobacct_p_alloc(). |
| <p style="margin-left:.2in"><b>Arguments</b>: |
| <span class="commandline">jobacct</span> |
| (input) structure to be freed. |
| <span class="commandline">none</span> |
| <p style="margin-left:.2in"><b>Returns</b>: |
| <span class="commandline">none</span> |
| |
| <p class="commandline"> |
| int jobacct_p_setinfo(jobacctinfo_t *jobacct, |
| enum jobacct_data_type type, void *data) |
| <p style="margin-left:.2in"><b>Description</b>: |
| jobacct_p_setinfo() is called to set the values of a jobacctinfo_t to |
| specific values based on inputs. |
| <p style="margin-left:.2in"><b>Arguments</b>: |
| <span class="commandline">jobacct</span> |
| (input/output) structure to be altered. |
| <span class="commandline">type</span> |
| (input) enum of specific part of jobacct to alter. |
| <span class="commandline">data</span> |
| (input) corresponding data to set jobacct part to. |
| <p style="margin-left:.2in"><b>Returns</b>: |
| <span class="commandline">SLURM_SUCCESS</span> on success, or |
| <span class="commandline">SLURM_FAILURE</span> on failure. |
| |
| <p class="commandline"> |
| int jobacct_p_getinfo(jobacctinfo_t *jobacct, |
| enum jobacct_data_type type, void *data) |
| <p style="margin-left:.2in"><b>Description</b>: |
| jobacct_p_getinfo() is called to get the values of a jobacctinfo_t |
| specific values based on inputs. |
| <p style="margin-left:.2in"><b>Arguments</b>: |
| <span class="commandline">jobacct</span> |
| (input) structure to be queried. |
| <span class="commandline">type</span> |
| (input) enum of specific part of jobacct to get. |
| <span class="commandline">data</span> |
| (output) corresponding data to from jobacct part. |
| <p style="margin-left:.2in"><b>Returns</b>: |
| <span class="commandline">SLURM_SUCCESS</span> on success, or |
| <span class="commandline">SLURM_FAILURE</span> on failure. |
| |
| <p class="commandline"> |
| void jobacct_p_aggregate(jobacctinfo_t *dest, jobacctinfo_t *from) |
| <p style="margin-left:.2in"><b>Description</b>: |
| jobacct_p_aggregate() is called to aggregate and get max values from two |
| different jobacctinfo structures. |
| <p style="margin-left:.2in"><b>Arguments</b>: |
| <span class="commandline">dest</span> |
| (input/output) initial structure to be applied to. |
| <span class="commandline">from</span> |
| (input) new info to apply to dest. |
| <p style="margin-left:.2in"><b>Returns</b>: |
| <span class="commandline">none</span> |
| |
| <p class="commandline"> |
| void jobacct_p_2_sacct(sacct_t *sacct, jobacctinfo_t *jobacct) |
| <p style="margin-left:.2in"><b>Description</b>: |
| jobacct_p_2_sacct() is called to transfer information from data structure |
| jobacct to structure sacct. |
| <p style="margin-left:.2in"><b>Arguments</b>: |
| <span class="commandline">sacct</span> |
| (input/output) initial structure to be applied to. |
| <span class="commandline">jobacct</span> |
| (input) jobacctinfo_t structure containing information to apply to sacct. |
| <p style="margin-left:.2in"><b>Returns</b>: |
| <span class="commandline">none</span> |
| |
| <p class="commandline"> |
| void jobacct_p_pack(jobacctinfo_t *jobacct, Buf buffer) |
| <p style="margin-left:.2in"><b>Description</b>: |
| jobacct_p_pack() pack jobacctinfo_t in a buffer to send across the network. |
| <p style="margin-left:.2in"><b>Arguments</b>: |
| <span class="commandline">jobacct</span> |
| (input) structure to pack. |
| <span class="commandline">buffer</span> |
| (input/output) buffer to pack structure into. |
| <p style="margin-left:.2in"><b>Returns</b>: |
| <span class="commandline">none</span> |
| |
| <p class="commandline"> |
| void jobacct_p_unpack(jobacctinfo_t *jobacct, Buf buffer) |
| <p style="margin-left:.2in"><b>Description</b>: |
| jobacct_p_unpack() unpack jobacctinfo_t from a buffer received from |
| the network. |
| You will need to free the jobacctinfo_t returned by this function! |
| <p style="margin-left:.2in"><b>Arguments</b>: |
| <span class="commandline">jobacct</span> |
| (input/output) structure to fill. |
| <span class="commandline">buffer</span> |
| (input) buffer to unpack structure from. |
| <p style="margin-left:.2in"><b>Returns</b>: |
| <span class="commandline">SLURM_SUCCESS</span> on success, or |
| <span class="commandline">SLURM_FAILURE</span> on failure. |
| |
| <p class="footer"><a href="#top">top</a> |
| |
| <h2>Parameters</h2> |
| <p>Rather than proliferate slurm.conf parameters for new or evolved |
| plugins, the job accounting API counts on three parameters: |
| <dl> |
| <dt><span class="commandline">JobAcctType</span> |
| <dd>Specifies which plugin should be used. |
| <dt><span class="commandline">JobAcctFrequency</span> |
| <dd>Let the plugin know how long between pollings. |
| <dt><span class="commandline">JobAcctLogFile</span> |
| <dd>Let the plugin the name of the logfile to use. |
| </dl> |
| |
| <h2>Versioning</h2> |
| <p> This document describes version 1 of the SLURM Job Accounting API. Future |
| releases of SLURM may revise this API. A job accounting plugin conveys its |
| ability to implement a particular API version using the mechanism outlined |
| for SLURM plugins. |
| <p class="footer"><a href="#top">top</a> |
| |
| <p style="text-align:center;">Last modified 31 January 2007</p> |
| |
| <!--#include virtual="footer.txt"--> |