blob: 3eb106689ae7d991ee9aa77cf1e94a72aff7eaf9 [file] [log] [blame]
# vim: set foldenable foldmethod=indent sw=4 ts=8 :
# Copyright 2013 Linbit HA Solutions GmbH
# Lars Ellenberg @ linbit.com
TODO:
someone convert this into proper ascii doc please ;-)
... and draw some pictures ...
How crm-fence-peer.sh, pacemaker, and the OCF Linbit DRBD resource agent
are supposed to work together.
Two node cluster is the trickier one, because it has not real quorum.
Relative Timeouts
--dc-timeout > dead-time resp. stonith-timeout
if stonith enabled, --timeout >= --dc-timeout
if no stonith, then timeout may be small.
Pacemaker operations timeouts
monitor and promote action timeout > max(dc_timeout, timeout)
Node reboot, possibly because of crash or stonith due to communication loss
no peer reachable [no delay]
crm may decide to elect itself, shoot the peer,
and start services.
If DRBD peer disk state is known Outdated or worse, DRBD will
switch itself to UpToDate, allowing it to be promoted,
without further fencing actions.
If DRBD peer disk state is DUnknown, DRBD will be only Consistent.
In case crm decides to promote this instance, the fence-peer callback
runs, finds the peer "unreachable", finds itself Consistent only,
does NOT set any constraint, and DRBD refuses to be promoted.
CRM will now try in an endless loop to promote this instance.
Avoid this by adding
param adjust_master_score="0 10 1000 10000"
to the DRBD resource definition.
no replication link
CRM can see both nodes. [delay: crmadmin -S $peer]
If currently both nodes are Secondary Consistent, CRM will decide to
promote one instance. The fence-peer callback will find the other node
still reachable after timeout, and set the constraint.
If there is already one Primary, and this is a node rejoining the
cluster, there should already be a constraint preventing this node
from being promoted.
Only Replication link breaks during normal operation
Single Primary [delay: crmadmin -S $peer]
fence-peer callback finds DC,
crmadmin -S confirms peer still "reachable",
and sets contraint.
Dual Primary
both fence-peer callbacks find DC,
both see node_state "reachable",
optionaly delay for --network-hickup timeout,
and if DRBD is still disconnected,
both try to set the constraint.
Only one succeeds.
The loser should probably commit suicide,
to reduce the overall recovery time.
--suicide-on-failure-if-primary
Node crash
surviving node is Secondary, [no delay]
If not DC, triggers DC election, elects itself.
Is DC now.
If stonith enabled, shoots the peer.
Promotes this node.
During promotion, fenc-peer callback
finds a DC, and a node_state "unreachable",
so sets the constraint "immediately".
surviving node is Primary (DC) [delay up to timeout]
If stonith enabled, shoots the peer.
fence-peer callback finds DC, after some
time sees node_state "unreachable",
or times out while node_state is still "reachable".
Either way still sets the constraint.
surviving node is Primary (not DC) [delay up to mac(dc_timeout,timeout)]
fence-peer callback loops trying to contact DC.
eventually this node is elected DC.
If stonith enabled, shoots the peer.
Fence-peer callback either times out while no DC is available,
thus fails. Make sure you chose a suitable --dc-timeout.
Or it finds the other node "unreachable",
and sets the constraint.
Total communication loss
To the single node, this looks like node crash, so see above.
The difference is the potential of data divergence.
If DRBD was configured for "fencing resource-and-stonith",
IO on any Primary is frozen while the fence-peer callback runs.
If stonith is enabled, timeouts should be selected so that
we are shot while waiting for the DC to confirm node_state
"unreachable" of the peer, thus combined with freezing IO,
no harmful data diversion can happen at this time.
If there is no stonith enabled, data divergence is unavoidable.
==> Multi-Primary *requires*
both node level fencing (stonith)
AND drbd resource level fencing
Again: Multi-Primary REQUIRES stonith enabled and working.