| <!--#include virtual="header.txt"--> |
| |
| <h1><a name="top">SLURM Process Tracking Plugin API</a></h1> |
| |
| <h2> Overview</h2> |
| <p> This document describes SLURM process tracking plugins and the API |
| that defines them. |
| It is intended as a resource to programmers wishing to write their |
| own SLURM process tracking plugins. |
| This is version 0 of the API.</p> |
| |
| <p>SLURM process tracking plugins are SLURM plugins that implement |
| the SLURM process tracking API described herein. |
| They must conform to the SLURM Plugin API with the following |
| specifications:</p> |
| |
| <p><span class="commandline">const char plugin_type[]</span><br> |
| The major type must be "proctrack." |
| The minor type can be any recognizable abbreviation for the type |
| of proctrack. We recommend, for example:</p> |
| <ul> |
| <li><b>aix</b>—Perform process tracking on an AIX platform. |
| NOTE: This requires a kernel extension that records |
| ever process creation and termination.</li> |
| <li><b>linuxproc</b>—Perform process tracking based upon a scan |
| of the Linux process table and use the parent process ID to determine |
| what processes are members of a SLURM job. NOTE: This mechanism is |
| not entirely reliable for process tracking.</li> |
| <li><b>pgid</b>—Use process group ID to determine |
| what processes are members of a SLURM job. NOTE: This mechanism is |
| not entirely reliable for process tracking.</li> |
| <li><b>rms</b>—Use a Quadrics RMS kernel patch to |
| establish what processes are members of a SLURM job. |
| NOTE: This requires a kernel patch that records |
| every process creation and termination.</li> |
| <li><b>sgj_job</b>—Use <a href="http://oss.sgi.com/projects/pagg/"> |
| SGI's Process Aggregates (PAGG) kernel module</a>. |
| NOTE: This kernel module records every process creation |
| and termination.</li> |
| </ul> |
| |
| <p>The <span class="commandline">plugin_name</span> and |
| <span class="commandline">plugin_version</span> symbols required |
| by the SLURM Plugin API require no specialization for process tracking. |
| Note carefully, however, the versioning discussion below.</p> |
| |
| <p>The programmer is urged to study |
| <span class="commandline">src/plugins/proctrack/pgid/proctrack_pgid.c</span> |
| for an example implementation of a SLURM proctrack plugin.</p> |
| <p class="footer"><a href="#top">top</a></p> |
| |
| <h2>Data Objects</h2> |
| <p> The implementation must support a container id of type uint32_t. |
| This container ID is maintained by the plugin directly in the slurmd |
| job structure using the field named <i>cont_id</i>.</p> |
| |
| <p>The implementation must maintain (though not necessarily directly export) an |
| enumerated <b>errno</b> to allow SLURM to discover as practically as possible |
| the reason for any failed API call. |
| These values must not be used as return values in integer-valued functions |
| in the API. |
| The proper error return value from integer-valued functions is SLURM_ERROR. |
| The implementation should endeavor to provide useful and pertinent information |
| by whatever means is practical. |
| Successful API calls are not required to reset errno to a known value.</p> |
| |
| <p class="footer"><a href="#top">top</a></p> |
| |
| <h2>API Functions</h2> |
| <p>The following functions must appear. Functions which are not implemented should |
| be stubbed.</p> |
| |
| <p class="commandline">int slurm_container_create (slurmd_job_t *job);</p> |
| <p style="margin-left:.2in"><b>Description</b>: Create a container. |
| The container should be valid |
| <span class="commandline">slurm_container_destroy()</span> is called. |
| This function must put the container ID directoy in the job structure's |
| variable <i>cont_id</i>.</p> |
| <p style="margin-left:.2in"><b>Argument</b>: |
| <span class="commandline"> job</span> (input/output) |
| Pointer to a slurmd job structure.</p> |
| <p style="margin-left:.2in"><b>Returns</b>: SLURM_SUCCESS if successful. On failure, |
| the plugin should return SLURM_ERROR and set the errno to an appropriate value |
| to indicate the reason for failure.</p> |
| |
| <p class="commandline">int slurm_container_add (slurmd_job_t *job, pid_t pid);</p> |
| <p style="margin-left:.2in"><b>Description</b>: Add a specific process ID |
| to a given job's container.</p> |
| <p style="margin-left:.2in"><b>Arguments</b>:<br> |
| <span class="commandline"> job</span> (input) |
| Pointer to a slurmd job structure.<br> |
| <span class="commandline"> pid</span> (input) |
| The ID of the process to add to this job's container.</p> |
| <p style="margin-left:.2in"><b>Returns</b>: SLURM_SUCCESS if successful. On failure, |
| the plugin should return SLURM_ERROR and set the errno to an appropriate value |
| to indicate the reason for failure.</p> |
| |
| <p class="commandline">int slurm_container_signal (uint32_t id, int signal);</p> |
| <p style="margin-left:.2in"><b>Description</b>: Signal all processes in a given |
| job's container.</p> |
| <p style="margin-left:.2in"><b>Arguments</b>:<br> |
| <span class="commandline"> id</span> (input) |
| Job container's ID.<br> |
| <span class="commandline"> signal</span> (input) |
| Signal to be sent to processes. Note that a signal of zero |
| just tests for the existence of processes in a given job container.</p> |
| <p style="margin-left:.2in"><b>Returns</b>: SLURM_SUCCESS if the signal |
| was sent. |
| If the signal can not be sent, the function should return SLURM_ERROR and set |
| its errno to an appropriate value to indicate the reason for failure.</p> |
| |
| <p class="footer"><a href="#top">top</a></p> |
| |
| <p class="commandline">int slurm_container_destroy (uint32_t id);</p> |
| <p style="margin-left:.2in"><b>Description</b>: Destroy or otherwise |
| invalidate a job container. |
| This does not imply the container is empty, just that it is no longer |
| needed.</p> |
| <p style="margin-left:.2in"><b>Arguments</b>: |
| <span class="commandline"> id</span> (input) |
| Job container's ID.</p> |
| <p style="margin-left:.2in"><b>Returns</b>: SLURM_SUCCESS if successful. On failure, |
| the plugin should return SLURM_ERROR and set the errno to an appropriate value |
| to indicate the reason for failure.</p> |
| |
| <p class="commandline">uint32_t slurm_container_find (pid_t pid);</p> |
| <p style="margin-left:.2in"><b>Description</b>: |
| Given a process ID, return its job container ID.</p> |
| <p style="margin-left:.2in"><b>Arguments</b>: |
| <span class="commandline"> pid</span> (input) |
| A process ID.</p> |
| <p style="margin-left:.2in"><b>Returns</b>: The job container ID |
| with this process or zero if none is found.</p> |
| |
| <h2>Versioning</h2> |
| <p> This document describes version 0 of the SLURM Process Tracking API. |
| Future releases of SLURM may revise this API. A process tracking plugin |
| conveys its ability to implement a particular API version using the |
| mechanism outlined for SLURM plugins.</p> |
| |
| <p class="footer"><a href="#top">top</a></p> |
| |
| <p style="text-align:center;">Last modified 6 June 2006</p> |
| |
| <!--#include virtual="footer.txt"--> |