| Program: rhcs_fence |
| Author: Madison Kelly (digimer@alteeve.com) |
| Alteeve's Niche! - https://alteeve.com/w/ |
| Date: 2013-03-13 |
| Version: 0.2.6 |
| License: GPL v2.0+ |
| |
| |
| -=] Description: |
| |
| This script is designed to be used as DRBD's 'fence-peer' handler. It ties |
| DRBD's fence call, using 'disk { fencing resource-and-stonith }' into Red Hat |
| Cluster Service's fenced daemon. This allows you to configure fencing once in |
| your cluster and use it for both the cluster and DRBD. |
| |
| This program was based heavily on Lon Hohberger <lhh[a]redhat.com> |
| "obliterate-peer.sh" script. This was created as a replacement fence handler |
| designed to add some features and intelligence to his script, but became a new |
| program in order to switch to perl. |
| |
| |
| -=] Supported Environments |
| |
| This script supports two-node DRBD setups only, but the nodes themselves may be |
| part of a larger cluster. This script should be used when |
| 'fencing resource-and-stonith' is set only. The 'on <host> { }' name *must* |
| match the '<clusternode name="..."> as well. |
| |
| |
| -=] Limitations |
| |
| As this handler insists on seeing the local disk as 'UpToDate' before it will |
| proceed with a fence. Thus, if you have a simultaneous and complete cluster |
| crash followed by the recovery of only one node, the recovered node will be in |
| a 'Consistent' state, which will abort the fence call. As such, this scenario |
| will recover human intervention to recover from. |
| |
| |
| -=] Notes: |
| |
| This program takes certain steps to avoid dual-fencing; |
| |
| - First, Timing: |
| |
| This program will use the cluster's 'Node ID' as a base value for a delay prior |
| to fencing. If a node has 'Node ID: 1' (as seen with 'cman_tool status'), there |
| will be no delay and the fence will occur immediately. All other nodes will |
| sleep for ((node_id x 2) + 5) seconds, up to a maximum of 30 seconds. |
| |
| It is possible to override this behaviour by setting 'local_delay' in the head |
| configuration section of the script. If this is a non-0 value, the script will |
| pause for the defined number of seconds, ignoring the behaviour described |
| above. |
| |
| - Second, 'UpToDate' check; |
| |
| When a fence call is made, this program checks the referenced resource minor |
| number's disk state to ensure it is 'UpToDate' as resported by '/proc/drbd'. |
| The fence call will abort if the disk state is 'Consistent' (or anything else). |
| This helps prevent accidentally fencing the original survivng node when the |
| cluster communication is up, but the storage network is not, avoiding a |
| fence-loop. |
| |
| |
| -=] Failure Modes: |
| |
| This program follows the "Fail Early, Fail Often" ethos. It will *only* fence |
| the peer if several conditions are met. Please test the functionality of this |
| script before going into production! |
| |
| Exit codes; |
| 1 - Fence failed, see syslog |
| 7 - Fence succeeded |
| 255 - End of script hit, likely a program error |
| |
| If you run into any trouble, please enable 'debug' mode by setting the internal |
| 'debug' value to '1'. If you need help, please send the output of the syslog |
| of both nodes with debug enabled to the Linux Cluster mailing list or DRBD |
| Users mailing list. |
| |
| |
| -=] Getting Help: |
| |
| By email: digimer@alteeve.com / https://alteeve.com |
| IRC: #linux-cluster and #drbd on freenode.net |
| Mailing list: https://www.redhat.com/mailman/listinfo/linux-cluster |
| http://lists.linbit.com/mailman/listinfo/drbd-user |