| <!--#include virtual="header.txt"--> |
| |
| <h1><a name="top">SLURM Task Plugin Programmer Guide</a></h1> |
| |
| <h2> Overview</h2> |
| <p> This document describes SLURM task management plugins and the API |
| that defines them. It is intended as a resource to programmers wishing |
| to write their own SLURM scheduler plugins. This is version 1 of the API.</p> |
| |
| <p>SLURM task management plugins are SLURM plugins that implement the |
| SLURM task management API described herein. They would typically be |
| used to control task affinity (i.e. binding tasks to processors). |
| They must conform to the SLURM Plugin API with the following |
| specifications:</p> |
| <p><span class="commandline">const char plugin_type[]</span><br> |
| The major type must be "task." The minor type can be any recognizable |
| abbreviation for the type of task management. We recommend, for example:</p> |
| <ul> |
| <li><b>affinity</b>—A plugin that implements task binding to processors. |
| The actual mechanism used to task binding is dependent upon the available |
| infrastructure as determined by the "configure" program when SLURM is built |
| and the value of the <b>TaskPluginParam</b> as defined in the <b>slurm.conf</b> |
| (SLURM configuration file).</li> |
| <li><b>none</b>—A plugin that implements the API without providing any |
| services. This is the default behavior and provides no task binding.</li> |
| </ul> |
| |
| <p>The <span class="commandline">plugin_name</span> and |
| <span class="commandline">plugin_version</span> |
| symbols required by the SLURM Plugin API require no specialization |
| for task support. |
| Note carefully, however, the versioning discussion below.</p> |
| |
| <p class="footer"><a href="#top">top</a></p> |
| |
| <h2>Data Objects</h2> |
| <p>The implementation must maintain (though not necessarily directly export) an |
| enumerated <span class="commandline">errno</span> to allow SLURM to discover |
| as practically as possible the reason for any failed API call. |
| These values must not be used as return values in integer-valued functions |
| in the API. The proper error return value from integer-valued functions is |
| SLURM_ERROR.</p> |
| <p class="footer"><a href="#top">top</a></p> |
| |
| <h2>API Functions</h2> |
| <p>The following functions must appear. Functions which are not implemented should |
| be stubbed.</p> |
| |
| <p class="commandline">int task_slurmd_batch_request (uint32_t job_id, |
| batch_job_launch_msg_t *req);</p> |
| <p style="margin-left:.2in"><b>Description</b>: Prepare to launch a batch job. |
| Establish node, socket, and core resource availability for it. |
| Executed by the <b>slurmd</b> daemon as user root.</p> |
| <p style="margin-left:.2in"><b>Arguments</b>:<br> |
| <span class="commandline">job_id</span> (input) |
| ID of the job to be started.<br> |
| <span class="commandline">req</span> (input/output) |
| Batch job launch request specification. |
| See <b>src/common/slurm_protocol_defs.h</b> for the |
| data structure definition.</p> |
| <p style="margin-left:.2in"><b>Returns</b>: SLURM_SUCCESS if successful. |
| On failure, the plugin should return SLURM_ERROR and set the errno to an |
| appropriate value to indicate the reason for failure.</p> |
| |
| <p class="commandline">int task_slurmd_launch_request (uint32_t job_id, |
| launch_tasks_request_msg_t *req, uint32_t node_id);</p> |
| <p style="margin-left:.2in"><b>Description</b>: Prepare to launch a job. |
| Establish node, socket, and core resource availability for it. |
| Executed by the <b>slurmd</b> daemon as user root.</p> |
| <p style="margin-left:.2in"><b>Arguments</b>:<br> |
| <span class="commandline">job_id</span> (input) |
| ID of the job to be started.<br> |
| <span class="commandline">req</span> (input/output) |
| Task launch request specification including node, socket, and |
| core specifications. |
| See <b>src/common/slurm_protocol_defs.h</b> for the |
| data structure definition.<br> |
| <span class="commandline">node_id</span> (input) |
| ID of the node on which resources are being acquired (zero origin).</p> |
| <p style="margin-left:.2in"><b>Returns</b>: SLURM_SUCCESS if successful. |
| On failure, the plugin should return SLURM_ERROR and set the errno to an |
| appropriate value to indicate the reason for failure.</p> |
| |
| <p class="commandline">int task_slurmd_reserve_resources (uint32_t job_id, |
| launch_tasks_request_msg_t *req, uint32_t node_id);</p> |
| <p style="margin-left:.2in"><b>Description</b>: Reserve resources for |
| the initiation of a job. Executed by the <b>slurmd</b> daemon as user root.</p> |
| <p style="margin-left:.2in"><b>Arguments</b>:<br> |
| <span class="commandline">job_id</span> (input) |
| ID of the job being started.<br> |
| <span class="commandline">req</span> (input) |
| Task launch request specification including node, socket, and |
| core specifications. |
| See <b>src/common/slurm_protocol_defs.h</b> for the |
| data structure definition.<br> |
| <span class="commandline">node_id</span> (input) |
| ID of the node on which the resources are being acquired |
| (zero origin).</p> |
| <p style="margin-left:.2in"><b>Returns</b>: SLURM_SUCCESS if successful. |
| On failure, the plugin should return SLURM_ERROR and set the errno to an |
| appropriate value to indicate the reason for failure.</p> |
| |
| <p class="commandline">int task_slurmd_suspend_job (uint32_t job_id);</p> |
| <p style="margin-left:.2in"><b>Description</b>: Temporarily release resources |
| previously reserved for a job. |
| Executed by the <b>slurmd</b> daemon as user root.</p> |
| <p style="margin-left:.2in"><b>Arguments</b>: |
| <span class="commandline">job_id</span> (input) |
| ID of the job which is being suspended.</p> |
| <p style="margin-left:.2in"><b>Returns</b>: SLURM_SUCCESS if successful. |
| On failure, the plugin should return SLURM_ERROR and set the errno to an |
| appropriate value to indicate the reason for failure.</p> |
| |
| <p class="commandline">int task_slurmd_resume_job (uint32_t job_id);</p> |
| <p style="margin-left:.2in"><b>Description</b>: Reclaim resources which |
| were previously released using the task_slurmd_suspend_job function. |
| Executed by the <b>slurmd</b> daemon as user root.</p> |
| <p style="margin-left:.2in"><b>Arguments</b>: |
| <span class="commandline">job_id</span> (input) |
| ID of the job which is being resumed.</p> |
| <p style="margin-left:.2in"><b>Returns</b>: SLURM_SUCCESS if successful. |
| On failure, the plugin should return SLURM_ERROR and set the errno to an |
| appropriate value to indicate the reason for failure.</p> |
| |
| <p class="commandline">int task_slurmd_release_resources (uint32_t job_id);</p> |
| <p style="margin-left:.2in"><b>Description</b>: Release resources previously |
| reserved for a job. Executed by the <b>slurmd</b> daemon as user root.</p> |
| <p style="margin-left:.2in"><b>Arguments</b>: |
| <span class="commandline">job_id</span> (input) |
| ID of the job which has completed.</p> |
| <p style="margin-left:.2in"><b>Returns</b>: SLURM_SUCCESS if successful. |
| On failure, the plugin should return SLURM_ERROR and set the errno to an |
| appropriate value to indicate the reason for failure.</p> |
| |
| <p class="commandline">int task_pre_setuid (slurmd_job_t *job);</p> |
| <p style="margin-left:.2in"><b>Description</b>: task_pre_setuid() is called |
| before setting the UID for the user to launch his jobs. |
| Executed by the <b>slurmstepd</b> program as user root.</p> |
| <p style="margin-left:.2in"><b>Arguments</b>: |
| <span class="commandline">job</span> (input) |
| pointer to the job to be initiated. |
| See <b>src/slurmd/slurmstepd/slurmstepd_job.h</b> for the |
| data structure definition.</p> |
| <p style="margin-left:.2in"><b>Returns</b>: SLURM_SUCCESS if successful. |
| On failure, the plugin should return SLURM_ERROR and set the errno to an |
| appropriate value to indicate the reason for failure.</p> |
| |
| <p class="commandline">int task_pre_launch (slurmd_job_t *job);</p> |
| <p style="margin-left:.2in"><b>Description</b>: task_pre_launch() is called |
| prior to exec of application task. |
| Executed by the <b>slurmstepd</b> program as the job's owner. |
| It is followed by <b>TaskProlog</b> program (as configured in <b>slurm.conf</b>) |
| and <b>--task-prolog</b> (from <b>srun</b> command line).</p> |
| <p style="margin-left:.2in"><b>Arguments</b>: |
| <span class="commandline">job</span> (input) |
| pointer to the job to be initiated. |
| See <b>src/slurmd/slurmstepd/slurmstepd_job.h</b> for the |
| data structure definition.</p> |
| <p style="margin-left:.2in"><b>Returns</b>: SLURM_SUCCESS if successful. |
| On failure, the plugin should return SLURM_ERROR and set the errno to an |
| appropriate value to indicate the reason for failure.</p> |
| |
| <a name="get_errno"><p class="commandline">int task_post_term (slurmd_job_t *job);</p></a> |
| <p style="margin-left:.2in"><b>Description</b>: task_term() is called |
| after termination of job step. |
| Executed by the <b>slurmstepd</b> program as the job's owner. |
| It is preceded by <b>--task-epilog</b> (from <b>srun</b> command line) |
| followed by <b>TaskEpilog</b> program (as configured in <b>slurm.conf</b>).</p> |
| <p style="margin-left:.2in"><b>Arguments</b>: |
| <span class="commandline">job</span> (input) |
| pointer to the job which has terminated. |
| See <b>src/slurmd/slurmstepd/slurmstepd_job.h</b> for the |
| data structure definition.</p> |
| <p style="margin-left:.2in"><b>Returns</b>: SLURM_SUCCESS if successful. |
| On failure, the plugin should return SLURM_ERROR and set the errno to an |
| appropriate value to indicate the reason for failure.</p> |
| |
| <h2>Versioning</h2> |
| <p> This document describes version 1 of the SLURM Task Plugin API. |
| Future releases of SLURM may revise this API.</p> |
| <p class="footer"><a href="#top">top</a></p> |
| |
| <p style="text-align:center;">Last modified 19 February 2009</p> |
| |
| <!--#include virtual="footer.txt"--> |