| <!--#include virtual="header.txt"--> |
| |
| <h1>Licenses Guide</h1> |
| |
| <h2 id="overview">Licenses Overview |
| <a class="slurm_link" href="#overview"></a> |
| </h2> |
| <p>Slurm can help with software license management by assigning available |
| licenses to jobs at scheduling time. If the licenses are not available, jobs |
| are kept pending until licenses become available. Licenses in Slurm are |
| essentially shared resources, meaning configured resources that are not tied to |
| a specific host but are associated with the entire cluster.</p> |
| |
| <p>Licenses in Slurm can be configured in two ways:</p> |
| <ul> |
| <li><b>Local Licenses:</b> |
| Local licenses are local to the cluster using the |
| <i>slurm.conf</i> in which they are configured. |
| </li> |
| <li><b>Remote Licenses:</b> |
| Remote licenses are served by the database and are configured using the |
| <i>sacctmgr</i> command. Remote licenses are dynamic in nature as upon running |
| the <i>sacctmgr</i> command, the <i>slurmdbd</i> updates all clusters the |
| licenses are assigned to. |
| </li> |
| </ul> |
| |
| <h2 id="local_licenses">Local Licenses |
| <a class="slurm_link" href="#local_licenses"></a> |
| </h2> |
| <p>Local licenses are defined in the slurm.conf using the <i>Licenses</i> |
| option.</p> |
| |
| <p>slurm.conf:</p> |
| <pre> |
| Licenses=fluent:30,ansys:100 |
| </pre> |
| |
| <p>Configured licenses can be viewed using the <i>scontrol</i> command.</p> |
| <pre> |
| $ scontrol show lic |
| LicenseName=ansys |
| Total=100 Used=0 Free=100 Remote=no |
| LicenseName=fluent |
| Total=30 Used=0 Free=30 Remote=no |
| </pre> |
| |
| <p>Requesting licenses is done by using the -L, or --licenses, submission |
| option.</p> |
| <pre> |
| $ sbatch -L ansys:2 script.sh |
| Submitted batch job 5212 |
| |
| $ scontrol show lic |
| LicenseName=ansys |
| Total=100 Used=2 Free=98 Remote=no |
| LicenseName=fluent |
| Total=30 Used=0 Free=30 Remote=no |
| </pre> |
| |
| <p>Licenses may also be requested using the <i>--tres-per-task</i> option for |
| job submission. If this approach is used, the license must also be defined in |
| the <b>AccountingStorageTRES</b> option of the slurm.conf.</p> |
| |
| <p>slurm.conf:</p> |
| <pre> |
| Licenses=fluent:30 |
| AccountingStorageTRES=license/fluent |
| </pre> |
| |
| <p>Requesting licenses with the <i>--tres-per-task</i> submission option.</p> |
| <pre> |
| $ sbatch --tres-per-task=license/fluent:4 script.sh |
| Submitted batch job 6482 |
| |
| $ scontrol show lic |
| LicenseName=fluent |
| Total=30 Used=4 Free=26 Reserved=0 Remote=no |
| </pre> |
| |
| |
| <h2 id="remote_licenses">Remote Licenses |
| <a class="slurm_link" href="#remote_licenses"></a> |
| </h2> |
| |
| <p>Remote licenses <b>do not</b> offer any kind of integration with third party |
| license managers by themselves. The use of the "Server" and "ServerType" |
| parameters when creating these licenses is just for informative purposes and |
| does not imply any kind of automatic license management with those servers. |
| It is the responsibility of the system administrator to implement any |
| integration needed with these systems. For example, that would include ensuring |
| that only users requesting remote licenses through Slurm can check out licenses |
| from the license server, or making sure that Slurm's license count stays |
| synchronized with license server's |
| (see <a href="#dynamic_licenses">Dynamic Licenses</a>).</p> |
| |
| <h3 id="use_case">Use Case<a class="slurm_link" href="#use_case"></a></h3> |
| <p>A site has two license servers, one serves 100 Nastran licenses provided by |
| FlexNet and the other serves 50 Matlab licenses from Reprise License |
| Management. The site has two clusters named "fluid" and "pdf" dedicated to run |
| simulation jobs using both products. The managers want to split the number of |
| Nastran licenses equally between clusters, but assign 70% of the Matlab |
| licenses to cluster "pdf" and the remaining 30% to cluster "fluid".</p> |
| |
| <h3 id="configuring">Configuring Slurm for the use case |
| <a class="slurm_link" href="#configuring"></a> |
| </h3> |
| <p>Here we assume that both clusters have been configured correctly in the |
| <i>slurmdbd</i> using the <i>sacctmgr</i> command.</p> |
| <pre> |
| $ sacctmgr show clusters format=cluster,controlhost |
| Cluster ControlHost |
| ---------- --------------- |
| fluid 143.11.1.3 |
| pdf 144.12.3.2 |
| </pre> |
| |
| <p>The licenses are added using the <i>sacctmgr</i> command, specifying the |
| total count of licenses and the percentage that should be allocated |
| to each cluster. This can be done either in one step or through a |
| multi-step process.</p> |
| |
| <p>One step:</p> |
| <pre> |
| $ sacctmgr add resource name=nastran cluster=fluid,pdf \ |
| count=100 allowed=50 server=flex_host servertype=flexlm type=license |
| Adding Resource(s) |
| nastran@flex_host |
| Cluster - fluid 50 |
| Cluster - pdf 50 |
| Settings |
| Name = nastran |
| Server = flex_host |
| Description = nastran |
| ServerType = flexlm |
| Count = 100 |
| Type = License |
| </pre> |
| |
| <p>Multi-step:</p> |
| <pre> |
| $ sacctmgr add resource name=matlab count=50 server=rlm_host \ |
| servertype=rlm type=license |
| Adding Resource(s) |
| matlab@rlm_host |
| Settings |
| Name = matlab |
| Server = rlm_host |
| Description = matlab |
| ServerType = rlm |
| Count = 50 |
| Type = License |
| |
| $ sacctmgr add resource name=matlab server=rlm_host \ |
| cluster=pdf allowed=70 |
| Adding Resource(s) |
| matlab@rlm_host |
| Cluster - pdf 70 |
| Settings |
| Name = matlab |
| Server = rlm_host |
| Count = 50 |
| LastConsumed = 0 |
| Flags = (null) |
| Type = License |
| |
| $ sacctmgr add resource name=matlab server=rlm_host \ |
| cluster=fluid allowed=30 |
| Adding Resource(s) |
| matlab@rlm_host |
| Cluster - fluid 30 |
| Settings |
| Name = matlab |
| Server = rlm_host |
| Count = 50 |
| LastConsumed = 0 |
| Flags = (null) |
| Type = License |
| </pre> |
| |
| |
| <p>The <i>sacctmgr</i> command will now display the grand total |
| of licenses.</p> |
| <pre> |
| $ sacctmgr show resource |
| Name Server Type Count LastConsumed Allocated ServerType Flags |
| ---------- ---------- -------- ------ ------------ --------- ---------- -------------------- |
| nastran flex_host License 100 0 100 flexlm |
| matlab rlm_host License 50 0 100 rlm |
| $ sacctmgr show resource withclusters |
| Name Server Type Count LastConsumed Allocated ServerType Cluster Allowed Flags |
| ---------- ---------- -------- ------ ------------ --------- ---------- ---------- -------- -------------------- |
| nastran flex_host License 100 0 100 flexlm fluid 50 |
| nastran flex_host License 100 0 100 flexlm pdf 50 |
| matlab rlm_host License 50 0 100 rlm fluid 30 |
| matlab rlm_host License 50 0 100 rlm pdf 70 |
| </pre> |
| |
| <p>The configured licenses are now visible on both clusters using the |
| <i>scontrol</i> command.</p> |
| <pre> |
| # On cluster "pdf": |
| $ scontrol show lic |
| LicenseName=matlab@rlm_host |
| Total=35 Used=0 Free=35 Reserved=0 Remote=yes |
| LastConsumed=0 LastDeficit=0 LastUpdate=2023-02-28T17:01:44 |
| LicenseName=nastran@flex_host |
| Total=50 Used=0 Free=50 Reserved=0 Remote=yes |
| LastConsumed=0 LastDeficit=0 LastUpdate=2023-02-28T17:01:44 |
| |
| # On cluster "fluid": |
| $ scontrol show lic |
| LicenseName=matlab@rlm_host |
| Total=15 Used=0 Free=15 Reserved=0 Remote=yes |
| LastConsumed=0 LastDeficit=0 LastUpdate=2023-02-28T17:01:44 |
| LicenseName=nastran@flex_host |
| Total=50 Used=0 Free=50 Reserved=0 Remote=yes |
| LastConsumed=0 LastDeficit=0 LastUpdate=2023-02-28T17:01:44 |
| </pre> |
| |
| <p>When submitting jobs to remote licenses, the name and server must be |
| used.</p> |
| <pre> |
| $ sbatch -L nastran@flex_host script.sh |
| Submitted batch job 5172 |
| </pre> |
| |
| |
| <p>License percentages and counts can be modified as shown below:</p> |
| <pre> |
| $ sacctmgr modify resource name=matlab server=rlm_host set \ |
| count=200 |
| Modified server resource ... |
| matlab@rlm_host |
| Cluster - fluid - matlab@rlm_host |
| Cluster - pdf - matlab@rlm_host |
| |
| $ sacctmgr modify resource name=matlab server=rlm_host \ |
| cluster=pdf set allowed=60 |
| Modified server resource ... |
| Cluster - pdf - matlab@rlm_host |
| |
| $ sacctmgr show resource withclusters |
| Name Server Type Count LastConsumed Allocated ServerType Cluster Allowed Flags |
| ---------- ---------- -------- ------ ------------ --------- ---------- ---------- -------- -------------------- |
| nastran flex_host License 100 0 100 flexlm fluid 50 |
| nastran flex_host License 100 0 100 flexlm pdf 50 |
| matlab rlm_host License 200 0 90 rlm fluid 30 |
| matlab rlm_host License 200 0 90 rlm pdf 60 |
| </pre> |
| |
| <p>Licenses can be deleted either on the cluster or all together as shown:</p> |
| <pre> |
| $ sacctmgr delete resource where name=matlab server=rlm_host cluster=fluid |
| Deleting resource(s)... |
| Deleting resource(s)... |
| Cluster - fluid - matlab@rlm_host |
| |
| $ sacctmgr delete resource where name=nastran server=flex_host |
| Deleting resource(s)... |
| nastran@flex_host |
| Cluster - fluid - nastran@flex_host |
| Cluster - pdf - nastran@flex_host |
| |
| $ sacctmgr show resource withclusters |
| Name Server Type Count LastConsumed Allocated ServerType Cluster Allowed Flags |
| ---------- ---------- -------- ------ ------------ --------- ---------- ---------- -------- -------------------- |
| matlab rlm_host License 200 0 60 rlm pdf 60 |
| </pre> |
| |
| <p>Starting with Slurm 23.02, a new <i>Absolute</i> flag is available that |
| indicates the license allowed values for each cluster are to be treated as |
| absolute license counts rather than percentages.</p> |
| |
| <p>Some brief examples of license management using this flag.</p> |
| <pre> |
| $ sacctmgr -i add resource name=deluxe cluster=fluid,pdf count=150 allowed=70 \ |
| server=flex_host servertype=flexlm flags=absolute |
| Adding Resource(s) |
| deluxe@flex_host |
| Cluster - fluid 70 |
| Cluster - pdf 70 |
| Settings |
| Name = deluxe |
| Server = flex_host |
| Description = deluxe |
| ServerType = flexlm |
| Count = 150 |
| Flags = Absolute |
| Type = Unknown |
| |
| $ sacctmgr show resource withclusters |
| Name Server Type Count LastConsumed Allocated ServerType Cluster Allowed Flags |
| ---------- ---------- -------- ------ ------------ --------- ---------- ---------- -------- -------------------- |
| deluxe flex_host License 150 0 140 flexlm fluid 70 Absolute |
| deluxe flex_host License 150 0 140 flexlm pdf 70 Absolute |
| |
| $ sacctmgr -i update resource deluxe set allowed=25 where cluster=fluid |
| Modified server resource ... |
| Cluster - fluid - deluxe@flex_host |
| |
| $ sacctmgr show resource withclusters |
| Name Server Type Count LastConsumed Allocated ServerType Cluster Allowed Flags |
| ---------- ---------- -------- ------ ------------ --------- ---------- ---------- -------- -------------------- |
| deluxe flex_host License 150 0 95 flexlm fluid 25 Absolute |
| deluxe flex_host License 150 0 95 flexlm pdf 70 Absolute |
| </pre> |
| |
| <p>This can also be established as the default for all newly created licenses |
| by adding <i>AllResourcesAbsolute=yes</i> to <i>slurmdbd.conf</i> (and restarting |
| SlurmDBD to make the change take effect).</p> |
| |
| <h2 id="dynamic_licenses">Dynamic Licenses |
| <a class="slurm_link" href="#dynamic_licenses"></a> |
| </h2> |
| <p>Starting with Slurm 23.02, the <i>LastConsumed</i> field for remote licenses |
| is designed to be periodically updated with the active use count from a license |
| server. An example script for FlexLM's lmstat command is provided below — |
| similar scripts can be easily constructed for other license management |
| stacks.</p> |
| |
| <pre> |
| #!/bin/bash |
| |
| set -euxo pipefail |
| |
| LMSTAT=/opt/foobar/bin/lmstat |
| LICENSE=foobar |
| |
| consumed=$(${LMSTAT} | grep "Users of ${LICENSE}"|sed "s/.*Total of \([0-9]\+\) licenses in use)/\1/") |
| |
| sacctmgr -i update resource ${LICENSE} set lastconsumed=${consumed} |
| </pre> |
| |
| <p>When the LastConsumed value is changed through sacctmgr an update is |
| automatically pushed to the Slurm controllers. They will use this value |
| to calculate a <i>LastDeficit</i> value — this value indicates how many |
| licenses that have "gone missing" from the cluster's perspective and will |
| need to be set aside temporarily.</p> |
| |
| <p>E.g., on this cluster 100 "foobar" licenses are available, and we are |
| allocating access to 80 of them on the "blackhole" cluster:</p> |
| <pre> |
| $ sacctmgr add resource foobar count=100 flags=absolute cluster=blackhole allowed=80 |
| Adding Resource(s) |
| foobar@slurmdb |
| Cluster - blackhole 80 |
| Settings |
| Name = foobar |
| Server = slurmdb |
| Description = foobar |
| Count = 100 |
| Flags = Absolute |
| Type = Unknown |
| Would you like to commit changes? (You have 30 seconds to decide) |
| (N/y): y |
| $ scontrol show license |
| LicenseName=foobar@slurmdb |
| Total=80 Used=0 Free=80 Reserved=0 Remote=yes |
| LastConsumed=0 LastDeficit=0 LastUpdate=2023-02-28T16:36:55 |
| </pre> |
| |
| <p>Now, our cron job comes in and updates the LastConsumed value to 30, while |
| the cluster has yet to allocate any licenses to jobs:</p> |
| |
| <pre> |
| $ sacctmgr -i update resource foobar set lastconsumed=30 |
| Modified server resource ... |
| foobar@slurmdb |
| Cluster - blackhole - foobar@slurmdb |
| $ scontrol show license |
| LicenseName=foobar@slurmdb |
| Total=80 Used=0 Free=70 Reserved=0 Remote=yes |
| LastConsumed=30 LastDeficit=10 LastUpdate=2023-02-28T16:39:27 |
| </pre> |
| |
| <p>Note that the cluster has now calculated a deficit of 10 licenses, and |
| has noticed that it should only schedule up to 70 licenses at the moment. |
| The cluster knows that up to 20 licenses are reserved for other clusters or |
| external use at the moment. However, since LastConsumed was set to 30 this |
| implies an additional 10 licenses have "gone rogue" and their usage cannot |
| be accounted for. Thus the cluster must not assign those to any pending jobs, |
| as it's likely that the job would fail to acquire the desired licenses.</p> |
| |
| <p>If a further update (likely driven through cron) now reduces the |
| LastConsumed count to 10, the deficit is now considered to have disappeared, |
| and the cluster will make all 80 assigned licenses available again:</p> |
| |
| <pre> |
| $ sacctmgr -i update resource foobar set lastconsumed=20 |
| Modified server resource ... |
| foobar@slurmdb |
| Cluster - blackhole - foobar@slurmdb |
| $ scontrol show license |
| LicenseName=foobar@slurmdb |
| Total=80 Used=0 Free=80 Reserved=0 Remote=yes |
| LastConsumed=20 LastDeficit=0 LastUpdate=2023-02-28T16:44:26 |
| </pre> |
| |
| <p style="text-align:center;">Last modified 09 May 2024</p> |
| |
| <!--#include virtual="footer.txt"--> |