blob: 7ca663c0f4126f412807769a59246985bd99c34f [file] [log] [blame]
Synchronization of the Persistent Reservation Information via the DLM
=====================================================================
Introduction
------------
In an H.A. setup where multiple servers share data it is required that
the persistent reservation state is kept consistent across the cluster.
One possible approach is to use the DLM to keep the PR state synchronized
across nodes. Since the DLM can associate data with each DLM lock object,
DLM lock objects can be used to store PR data. The data that is associated
with a DLM lock object is called the Lock Value Block or LVB. The code in
scst_dlm.c uses the DLM to keep PR data synchronized across all nodes in
a cluster.
Software Components
-------------------
The following software components are needed by the code in scst_dlm.c:
* The DLM kernel driver (dlm.ko). This driver is only built if CONFIG_DLM
has been set.
* The DLM control daemon (dlm_controld.pcmk). This daemon passes cluster
node IDs and IP addresses to the DLM kernel driver via the configfs
interface of the DLM kernel driver.
* Corosync to manage cluster membership of the cluster nodes and to assign
a node ID to each cluster node.
* A facility to start the DLM control daemon, e.g. Pacemaker.
On most Linux distributions the software packages that contain this software
have the names kernel, dlm, corosync and pacemaker.
NOTE! You might need to apply a DLM bugfix patch, see scst-devel mailing list
thread https://sourceforge.net/p/scst/mailman/scst-devel/thread/CADHfD59FK6seaammL8b9LM3U3tw5HvYp3kPTk_r1OYkPR7bPhg@mail.gmail.com/#msg34761854
for more details.
DLM Configuration
-----------------
The DLM kernel module supports the TCP and SCTP communication protocols. An
advantage of SCTP for H.A. purposes is that it supports multihoming. One of
these protocols can be selected via the -r <proto> option of dlm_controld.
That option can be set via the "args" argument of the Pacemaker dlm_controld
resource. For more information, see also:
* The dlm_controld(8) man page.
* In the "Pacemaker 1.1, Clusters from Scratch" guide, the section "Configure
the Cluster for the DLM".
* The dlm_controld resource agent: /usr/lib/ocf/resource.d/pacemaker/controld
Here is an example of how to set up a cluster with two nodes and how to
configure and start the DLM control daemon:
1. If a network switch is present between the two nodes, enable IPv4 multicast
on that switch.
2. Copy /etc/corosync/corosync.conf.example into /etc/corosync/corosync.conf
and edit that file.
3. If a file /etc/default/corosync exists, enable Corosync in that file.
4. Start Corosync:
systemctl start corosync || /etc/init.d/corosync start
5. Check that all configured Corosync rings have two members:
corosync-cfgtool -s && { corosync-cmapctl | grep members; }
6. Start pcsd:
systemctl start pcsd || /etc/init.d/pcsd start
7. Set up cluster authentication:
pcs cluster auth centos7-vm centos7b-vm
8. Start Pacemaker:
systemctl start pacemaker || /etc/init.d/pacemaker start
9. If the cluster has only two nodes, disable the Pacemaker quorum policy and
disable STONITH:
crm_attribute -t crm_config -n no-quorum-policy -v ignore
crm_attribute -t crm_config -n stonith-enabled -v false
10. Check the cluster status:
pcs status
11. Create a Pacemaker resource for dlm_controld:
pcs resource delete dlm
pcs resource create dlm ocf:pacemaker:controld \
args="-q0 -f0" allow_stonith_disabled=true \
op monitor timeout=60 \
--clone interleave=true
12. Check the Pacemaker status:
pcs status
Startup and Shutdown
--------------------
The startup sequence is as follows:
* Load the DLM kernel module. If not loaded explicitly, "modprobe scst" will
load the DLM kernel module implicitly.
* Load and configure SCST with all target ports disabled.
* Enable cluster mode for all SCST devices that can be accessed through more
than one cluster node:
for x in /sys/kernel/scst_tgt/handlers/*/*/; do
echo 1 >$x/cluster_mode &
done
wait
* Start Corosync and Pacemaker.
* Wait until Pacemaker has reached the idle state:
pacemaker_dc_status() {
local dc
dc="$(crmadmin -D 2>/dev/null | sed 's/Designated Controller is: //')"
[ -n "$dc" ] &&
crmadmin -S "$dc" 2>/dev/null |
sed 's/^Status of crmd@[^[:blank:]]*:[[:blank:]]\([^[:blank:]]*\).*/\1/'
}
timeout=300
for ((i=0;i<timeout;i++)); do
if [ "$(pacemaker_dc_status)" = "S_IDLE" ]; then
echo "Pacemaker reached idle state after $i s"
break
fi
sleep 1
done
if [ "$i" = "$timeout" ]; then
echo "Pacemaker did not reach the IDLE state in $i s"
fi
* Enable SCST target ports.
* If no DLM resource has been configured in Pacemaker, start dlm_controld.pcmk
explicitly.
The proper shutdown order is as follows:
* Tell SCST to stop accepting SCSI commands and wait until all initiators have
logged out:
for x in $(find /sys/kernel/scst_tgt/targets/ -name enabled); do
echo 0 > $x &
done
wait
while ls -Ad /sys/kernel/scst_tgt/targets/*/*/sessions/* 2>&1 |
grep -vE '/sys/kernel/scst_tgt/targets/(copy_manager|scst_local)/'; do
sleep 1
done
* Tell SCST to release the DLM lockspaces:
while grep -q '^1$' /sys/kernel/scst_tgt/devices/*/cluster_mode 2>/dev/null
do
for x in /sys/kernel/scst_tgt/devices/*/cluster_mode; do
{ [ -e "$x" ] && echo 0 > "$x"; } &
done
wait
sleep 1
done
* Stop Pacemaker and Corosync
* Unload the SCST kernel modules
* Unload the DLM kernel driver
Lockspace names
---------------
The names of the DLM lockspaces used by SCST follow the following pattern:
scst-<t10_dev_id> where t10_dev_id is the T10 device ID of the SCST device
associated with this lockspace.
Notes
-----
Since the lockspace name depends on the t10_dev_id it is not allowed to
change the t10_dev_id if cluster mode has been enabled.
Testing
-------
Two examples of test suites for the cluster PR support code are:
* The SCSI conformance tests in the libiscsi project.
* The Windows Cluster Validation Tests
(https://technet.microsoft.com/en-us/library/Cc726064.aspx).
To do
-----
* Ensure that PREEMPT AND ABORT affects all cluster nodes instead of
only the cluster node that received this command.
See also
--------
* Bart Van Assche, Using the DLM as a distributed in-memory database, Linux
Plumbers North America, Seattle, August 20, 2015
(https://linuxplumbersconf.org/2015/ocw//system/presentations/2691/original/Using%20the%20DLM%20as%20a%20Distributed%20In-Memory%20Database.pdf).
* Andrew Beekhof, Pacemaker Configuration Explained, 2015
(http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/).
* Andrew Beekhof, Clusters from Scratch, 2015
(http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Clusters_from_Scratch/index.html).