blob: f10595de54482b03479e6a5bc93734cf2c832bfb [file] [log] [blame]
Program: rhcs_fence
Author: Madison Kelly (digimer@alteeve.com)
Alteeve's Niche! - https://alteeve.com/w/
Date: 2013-03-13
Version: 0.2.6
License: GPL v2.0+
-=] Description:
This script is designed to be used as DRBD's 'fence-peer' handler. It ties
DRBD's fence call, using 'disk { fencing resource-and-stonith }' into Red Hat
Cluster Service's fenced daemon. This allows you to configure fencing once in
your cluster and use it for both the cluster and DRBD.
This program was based heavily on Lon Hohberger <lhh[a]redhat.com>
"obliterate-peer.sh" script. This was created as a replacement fence handler
designed to add some features and intelligence to his script, but became a new
program in order to switch to perl.
-=] Supported Environments
This script supports two-node DRBD setups only, but the nodes themselves may be
part of a larger cluster. This script should be used when
'fencing resource-and-stonith' is set only. The 'on <host> { }' name *must*
match the '<clusternode name="..."> as well.
-=] Limitations
As this handler insists on seeing the local disk as 'UpToDate' before it will
proceed with a fence. Thus, if you have a simultaneous and complete cluster
crash followed by the recovery of only one node, the recovered node will be in
a 'Consistent' state, which will abort the fence call. As such, this scenario
will recover human intervention to recover from.
-=] Notes:
This program takes certain steps to avoid dual-fencing;
- First, Timing:
This program will use the cluster's 'Node ID' as a base value for a delay prior
to fencing. If a node has 'Node ID: 1' (as seen with 'cman_tool status'), there
will be no delay and the fence will occur immediately. All other nodes will
sleep for ((node_id x 2) + 5) seconds, up to a maximum of 30 seconds.
It is possible to override this behaviour by setting 'local_delay' in the head
configuration section of the script. If this is a non-0 value, the script will
pause for the defined number of seconds, ignoring the behaviour described
above.
- Second, 'UpToDate' check;
When a fence call is made, this program checks the referenced resource minor
number's disk state to ensure it is 'UpToDate' as resported by '/proc/drbd'.
The fence call will abort if the disk state is 'Consistent' (or anything else).
This helps prevent accidentally fencing the original survivng node when the
cluster communication is up, but the storage network is not, avoiding a
fence-loop.
-=] Failure Modes:
This program follows the "Fail Early, Fail Often" ethos. It will *only* fence
the peer if several conditions are met. Please test the functionality of this
script before going into production!
Exit codes;
1 - Fence failed, see syslog
7 - Fence succeeded
255 - End of script hit, likely a program error
If you run into any trouble, please enable 'debug' mode by setting the internal
'debug' value to '1'. If you need help, please send the output of the syslog
of both nodes with debug enabled to the Linux Cluster mailing list or DRBD
Users mailing list.
-=] Getting Help:
By email: digimer@alteeve.com / https://alteeve.com
IRC: #linux-cluster and #drbd on freenode.net
Mailing list: https://www.redhat.com/mailman/listinfo/linux-cluster
http://lists.linbit.com/mailman/listinfo/drbd-user