blob: fc1cdff64462b6e800c088f7b47129a6cda582ca [file] [log] [blame]
<!--#include virtual="header.txt"-->
<h1><a name="top">SLURM Job Accounting Plugin API</a></h1>
<h2> Overview</h2>
<p> This document describes SLURM job accounting plugins and the API that
defines them. It is intended as a resource to programmers wishing to write
their own SLURM job accounting plugins. This is version 1 of the API.
<p>SLURM job accounting plugins must conform to the
SLURM Plugin API with the following specifications:
<p><span class="commandline">const char
plugin_name[]="<i>full&nbsp;text&nbsp;name</i>"
<p style="margin-left:.2in">
A free-formatted ASCII text string that identifies the plugin.
<p><span class="commandline">const char
plugin_type[]="<i>major/minor</i>"</span><br>
<p style="margin-left:.2in">
The major type must be &quot;jobacct.&quot;
The minor type can be any suitable name
for the type of accounting package. We currently use
<ul>
<li><b>aix</b>&#151; Gathers information from AIX /proc table and adds this
information to the standard rusage information also gathered for each job.
<li><b>linux</b>&#151;Gathers information from Linux /proc table and adds this
information to the standard rusage information also gathered for each job.
<li><b>none</b>&#151;No information gathered.
</ul>
The <b>sacct</b> program can be used to display gathered data from regular
accounting and from these plugins.
<p>The programmer is urged to study
<span class="commandline">src/plugins/jobacct/linux</span> and
<span class="commandline">src/plugins/jobacct/common</span>
for a sample implementation of a SLURM job accounting plugin.
<p class="footer"><a href="#top">top</a>
<h2>API Functions</h2>
The job accounting API uses hooks in the slurmctld, slurmd, and slurmstepd.
<p>All of the following functions are required. Functions which are not
implemented must be stubbed.
<h4>Functions called by all slurmstepd processes</h4>
<p class="commandline">int jobacct_p_startpoll(int frequency)
<p style="margin-left:.2in"><b>Description</b>:
jobacct_p_startpoll() is called at the start of the slurmstepd,
this starts a thread that should poll information to be queried at any time
during throughout the end of the process.
Put global initialization here.
<p style="margin-left:.2in"><b>Arguments</b>:
<span class="commandline">frequency</span> (input) poll frequency for polling
thread.
<p style="margin-left:.2in"><b>Returns</b>:
<span class="commandline">SLURM_SUCCESS</span> on success, or
<span class="commandline">SLURM_FAILURE</span> on failure.
<p class="commandline">int jobacct_p_endpoll()
<p style="margin-left:.2in"><b>Description</b>:
jobacct_p_endpoll() is called when the process is finished to stop the
polling thread.
<p style="margin-left:.2in"><b>Arguments</b>:
<span class="commandline">none</span>
<p style="margin-left:.2in"><b>Returns</b>:
<span class="commandline">SLURM_SUCCESS</span> on success, or
<span class="commandline">SLURM_FAILURE</span> on failure.
<p class="commandline">void jobacct_p_suspend_poll()
<p style="margin-left:.2in"><b>Description</b>:
jobacct_p_suspend_poll() is called when the process is suspended.
This causes the polling thread to halt until the process is resumed.
<p style="margin-left:.2in"><b>Arguments</b>:
<span class="commandline">none</span>
<p style="margin-left:.2in"><b>Returns</b>:
<span class="commandline">none</span>
<p class="commandline">void jobacct_p_resume_poll()
<p style="margin-left:.2in"><b>Description</b>:
jobacct_p_resume_poll() is called when the process is resumed.
This causes the polling thread to resume operation.
<p style="margin-left:.2in"><b>Arguments</b>:
<span class="commandline">none</span>
<p style="margin-left:.2in"><b>Returns</b>:
<span class="commandline">none</span>
<p class="commandline">int jobacct_p_add_task(pid_t pid, uint16_t tid)
<p style="margin-left:.2in"><b>Description</b>:
jobacct_p_add_task() used to add a task to the poller.
<p style="margin-left:.2in"><b>Arguments</b>:
<span class="commandline"> pid</span> (input) Process id
<span class="commandline"> tid</span> (input) slurm global task id
<p style="margin-left:.2in"><b>Returns</b>:
<span class="commandline">SLURM_SUCCESS</span> on success, or
<span class="commandline">SLURM_FAILURE</span> on failure.
<p class="commandline">jobacctinfo_t *jobacct_p_stat_task(pid_t pid)
<p style="margin-left:.2in"><b>Description</b>:
jobacct_p_stat_task() used to get most recent information about task.
You need to FREE the information returned by this function!
<p style="margin-left:.2in"><b>Arguments</b>:
<span class="commandline"> pid</span> (input) Process id
<p style="margin-left:.2in"><b>Returns</b>:
<span class="commandline">jobacctinfo structure pointer</span> on success, or
<span class="commandline">NULL</span> on failure.
<p class="commandline">jobacctinfo_t *jobacct_p_remove_task(pid_t pid)
<p style="margin-left:.2in"><b>Description</b>:
jobacct_p_remove_task() used to remove a task from the poller.
You need to FREE the information returned by this function!
<p style="margin-left:.2in"><b>Arguments</b>:
<span class="commandline"> pid</span> (input) Process id
<p style="margin-left:.2in"><b>Returns</b>:
<span class="commandline">Pointer to removed jobacctinfo_t structure</span>
on success, or
<span class="commandline">NULL</span> on failure.
<p class="footer"><a href="#top">top</a>
<h4>Functions called by the slurmctld process</h4>
<p class="commandline">int jobacct_p_init_slurmctld(char *job_acct_log)
<p style="margin-left:.2in"><b>Description</b>:
jobacct_p_init_slurmctld() is called at the start of the slurmctld,
this opens the logfile to be written to.
Put global initialization here.
<p style="margin-left:.2in"><b>Arguments</b>:
<span class="commandline">job_acct_log</span> (input) logfile name.
<p style="margin-left:.2in"><b>Returns</b>:
<span class="commandline">SLURM_SUCCESS</span> on success, or
<span class="commandline">SLURM_FAILURE</span> on failure.
<p class="commandline">int jobacct_p_fini_slurmctld()
<p style="margin-left:.2in"><b>Description</b>:
jobacct_p_fini_slurmctld() is called at the end of the slurmctld,
this closes the logfile.
<p style="margin-left:.2in"><b>Arguments</b>:
<span class="commandline">none</span>
<p style="margin-left:.2in"><b>Returns</b>:
<span class="commandline">SLURM_SUCCESS</span> on success, or
<span class="commandline">SLURM_FAILURE</span> on failure.
<p class="commandline">
int jobacct_p_job_start_slurmctld(struct job_record *job_ptr)
<p style="margin-left:.2in"><b>Description</b>:
jobacct_p_job_start_slurmctld() is called at the allocation of a new job in
the slurmctld, this prints out beginning information about a job.
<p style="margin-left:.2in"><b>Arguments</b>:
<span class="commandline">job_ptr</span> (input) information about the job in
slurmctld.
<p style="margin-left:.2in"><b>Returns</b>:
<span class="commandline">SLURM_SUCCESS</span> on success, or
<span class="commandline">SLURM_FAILURE</span> on failure.
<p class="commandline">
int jobacct_p_job_complete_slurmctld(struct job_record *job_ptr)
<p style="margin-left:.2in"><b>Description</b>:
jobacct_p_job_complete_slurmctld() is called at the end of a job in
the slurmctld, this prints out ending information about a job.
<p style="margin-left:.2in"><b>Arguments</b>:
<span class="commandline">job_ptr</span> (input) information about the job in
slurmctld.
<p style="margin-left:.2in"><b>Returns</b>:
<span class="commandline">SLURM_SUCCESS</span> on success, or
<span class="commandline">SLURM_FAILURE</span> on failure.
<p class="commandline">
int jobacct_p_step_start_slurmctld(struct step_record *step_ptr)
<p style="margin-left:.2in"><b>Description</b>:
jobacct_p_step_start_slurmctld() is called at the allocation of a new step in
the slurmctld, this prints out beginning information about a step.
<p style="margin-left:.2in"><b>Arguments</b>:
<span class="commandline">step_ptr</span> (input) information about the step in
slurmctld.
<p style="margin-left:.2in"><b>Returns</b>:
<span class="commandline">SLURM_SUCCESS</span> on success, or
<span class="commandline">SLURM_FAILURE</span> on failure.
<p class="commandline">
int jobacct_p_step_complete_slurmctld(struct step_record *step_ptr)
<p style="margin-left:.2in"><b>Description</b>:
jobacct_p_step_complete_slurmctld() is called at the end of a step in
the slurmctld, this prints out ending information about a step.
<p style="margin-left:.2in"><b>Arguments</b>:
<span class="commandline">step_ptr</span> (input) information about the step in
slurmctld.
<p style="margin-left:.2in"><b>Returns</b>:
<span class="commandline">SLURM_SUCCESS</span> on success, or
<span class="commandline">SLURM_FAILURE</span> on failure.
<p class="commandline">
int jobacct_p_suspend_slurmctld(struct job_record *job_ptr)
<p style="margin-left:.2in"><b>Description</b>:
jobacct_p_suspend_slurmctld() is called when a job is suspended or resumed in
the slurmctld, this prints out information about the suspension of the job
to the logfile.
<p style="margin-left:.2in"><b>Arguments</b>:
<span class="commandline">job_ptr</span> (input) information about the job in
slurmctld.
<p style="margin-left:.2in"><b>Returns</b>:
<span class="commandline">SLURM_SUCCESS</span> on success, or
<span class="commandline">SLURM_FAILURE</span> on failure.
<p class="footer"><a href="#top">top</a>
<h4>Functions common to all processes</h4>
<p class="commandline">
int jobacct_p_init_struct(jobacctinfo_t *jobacct, uint16_t tid)
<p style="margin-left:.2in"><b>Description</b>:
jobacct_p_init_struct() is called to set the values of a jobacctinfo_t to
initial values.
<p style="margin-left:.2in"><b>Arguments</b>:
<span class="commandline">jobacct</span>
(input/output) structure to be altered.
<span class="commandline">tid</span>
(input) id of the task send in (uint16_t)NO_VAL if no specfic task.
<p style="margin-left:.2in"><b>Returns</b>:
<span class="commandline">SLURM_SUCCESS</span> on success, or
<span class="commandline">SLURM_FAILURE</span> on failure.
<p class="commandline">jobacctinfo_t *jobacct_p_alloc(uint16_t tid)
<p style="margin-left:.2in"><b>Description</b>:
jobacct_p_alloc() used to alloc a pointer to and initialize a
new jobacctinfo structure.<br>
You will need to free the information returned by this function!
<p style="margin-left:.2in"><b>Arguments</b>:
<span class="commandline">tid</span>
(input) id of the task send in (uint16_t)NO_VAL if no specfic task.
<p style="margin-left:.2in"><b>Returns</b>:
<span class="commandline">jobacctinfo structure pointer</span> on success, or
<span class="commandline">NULL</span> on failure.
<p class="commandline">void jobacct_p_free(jobacctinfo_t *jobacct)
<p style="margin-left:.2in"><b>Description</b>:
jobacct_p_free() used to free the allocation made by jobacct_p_alloc().
<p style="margin-left:.2in"><b>Arguments</b>:
<span class="commandline">jobacct</span>
(input) structure to be freed.
<span class="commandline">none</span>
<p style="margin-left:.2in"><b>Returns</b>:
<span class="commandline">none</span>
<p class="commandline">
int jobacct_p_setinfo(jobacctinfo_t *jobacct,
enum jobacct_data_type type, void *data)
<p style="margin-left:.2in"><b>Description</b>:
jobacct_p_setinfo() is called to set the values of a jobacctinfo_t to
specific values based on inputs.
<p style="margin-left:.2in"><b>Arguments</b>:
<span class="commandline">jobacct</span>
(input/output) structure to be altered.
<span class="commandline">type</span>
(input) enum of specific part of jobacct to alter.
<span class="commandline">data</span>
(input) corresponding data to set jobacct part to.
<p style="margin-left:.2in"><b>Returns</b>:
<span class="commandline">SLURM_SUCCESS</span> on success, or
<span class="commandline">SLURM_FAILURE</span> on failure.
<p class="commandline">
int jobacct_p_getinfo(jobacctinfo_t *jobacct,
enum jobacct_data_type type, void *data)
<p style="margin-left:.2in"><b>Description</b>:
jobacct_p_getinfo() is called to get the values of a jobacctinfo_t
specific values based on inputs.
<p style="margin-left:.2in"><b>Arguments</b>:
<span class="commandline">jobacct</span>
(input) structure to be queried.
<span class="commandline">type</span>
(input) enum of specific part of jobacct to get.
<span class="commandline">data</span>
(output) corresponding data to from jobacct part.
<p style="margin-left:.2in"><b>Returns</b>:
<span class="commandline">SLURM_SUCCESS</span> on success, or
<span class="commandline">SLURM_FAILURE</span> on failure.
<p class="commandline">
void jobacct_p_aggregate(jobacctinfo_t *dest, jobacctinfo_t *from)
<p style="margin-left:.2in"><b>Description</b>:
jobacct_p_aggregate() is called to aggregate and get max values from two
different jobacctinfo structures.
<p style="margin-left:.2in"><b>Arguments</b>:
<span class="commandline">dest</span>
(input/output) initial structure to be applied to.
<span class="commandline">from</span>
(input) new info to apply to dest.
<p style="margin-left:.2in"><b>Returns</b>:
<span class="commandline">none</span>
<p class="commandline">
void jobacct_p_2_sacct(sacct_t *sacct, jobacctinfo_t *jobacct)
<p style="margin-left:.2in"><b>Description</b>:
jobacct_p_2_sacct() is called to transfer information from data structure
jobacct to structure sacct.
<p style="margin-left:.2in"><b>Arguments</b>:
<span class="commandline">sacct</span>
(input/output) initial structure to be applied to.
<span class="commandline">jobacct</span>
(input) jobacctinfo_t structure containing information to apply to sacct.
<p style="margin-left:.2in"><b>Returns</b>:
<span class="commandline">none</span>
<p class="commandline">
void jobacct_p_pack(jobacctinfo_t *jobacct, Buf buffer)
<p style="margin-left:.2in"><b>Description</b>:
jobacct_p_pack() pack jobacctinfo_t in a buffer to send across the network.
<p style="margin-left:.2in"><b>Arguments</b>:
<span class="commandline">jobacct</span>
(input) structure to pack.
<span class="commandline">buffer</span>
(input/output) buffer to pack structure into.
<p style="margin-left:.2in"><b>Returns</b>:
<span class="commandline">none</span>
<p class="commandline">
void jobacct_p_unpack(jobacctinfo_t *jobacct, Buf buffer)
<p style="margin-left:.2in"><b>Description</b>:
jobacct_p_unpack() unpack jobacctinfo_t from a buffer received from
the network.
You will need to free the jobacctinfo_t returned by this function!
<p style="margin-left:.2in"><b>Arguments</b>:
<span class="commandline">jobacct</span>
(input/output) structure to fill.
<span class="commandline">buffer</span>
(input) buffer to unpack structure from.
<p style="margin-left:.2in"><b>Returns</b>:
<span class="commandline">SLURM_SUCCESS</span> on success, or
<span class="commandline">SLURM_FAILURE</span> on failure.
<p class="footer"><a href="#top">top</a>
<h2>Parameters</h2>
<p>Rather than proliferate slurm.conf parameters for new or evolved
plugins, the job accounting API counts on three parameters:
<dl>
<dt><span class="commandline">JobAcctType</span>
<dd>Specifies which plugin should be used.
<dt><span class="commandline">JobAcctFrequency</span>
<dd>Let the plugin know how long between pollings.
<dt><span class="commandline">JobAcctLogFile</span>
<dd>Let the plugin the name of the logfile to use.
</dl>
<h2>Versioning</h2>
<p> This document describes version 1 of the SLURM Job Accounting API. Future
releases of SLURM may revise this API. A job accounting plugin conveys its
ability to implement a particular API version using the mechanism outlined
for SLURM plugins.
<p class="footer"><a href="#top">top</a>
<p style="text-align:center;">Last modified 31 January 2007</p>
<!--#include virtual="footer.txt"-->