drbd-utils/documentation/fencing-by-constraints.txt - actifio - Git at Google

 # vim: set foldenable foldmethod=indent sw=4 ts=8 :

 # Copyright 2013 Linbit HA Solutions GmbH
 # Lars Ellenberg @ linbit.com

 TODO:
 	someone convert this into proper ascii doc please ;-)
 	... and draw some pictures ...

 How crm-fence-peer.sh, pacemaker, and the OCF Linbit DRBD resource agent
 are supposed to work together.

 Two node cluster is the trickier one, because it has not real quorum.

 Relative Timeouts
 	--dc-timeout > dead-time resp. stonith-timeout
 	if stonith enabled, --timeout >= --dc-timeout
 	if no stonith, then timeout may be small.

 Pacemaker operations timeouts
 	monitor and promote action timeout > max(dc_timeout, timeout)

 Node reboot, possibly because of crash or stonith due to communication loss
 	no peer reachable	[no delay]
 		crm may decide to elect itself, shoot the peer,
 		and start services.

 		If DRBD peer disk state is known Outdated or worse, DRBD will
 		switch itself to UpToDate, allowing it to be promoted,
 		without further fencing actions.

 		If DRBD peer disk state is DUnknown, DRBD will be only Consistent.
 		In case crm decides to promote this instance, the fence-peer callback
 		runs, finds the peer "unreachable", finds itself Consistent only,
 		does NOT set any constraint, and DRBD refuses to be promoted.

 		CRM will now try in an endless loop to promote this instance.

 		Avoid this by adding
 		param adjust_master_score="0 10 1000 10000"
 		to the DRBD resource definition.

 	no replication link
 		CRM can see both nodes. [delay: crmadmin -S $peer]

 		If currently both nodes are Secondary Consistent, CRM will decide to
 		promote one instance. The fence-peer callback will find the other node
 		still reachable after timeout, and set the constraint.

 		If there is already one Primary, and this is a node rejoining the
 		cluster, there should already be a constraint preventing this node
 		from being promoted.

 Only Replication link breaks during normal operation
 	Single Primary  [delay: crmadmin -S $peer]
 		fence-peer callback finds DC,
 		crmadmin -S confirms peer still "reachable",
 		and sets contraint.

 	Dual Primary
 		both fence-peer callbacks find DC,
 		both see node_state "reachable",
 		optionaly delay for --network-hickup timeout,
 		and if DRBD is still disconnected,
 		both try to set the constraint.
 		Only one succeeds.

 		The loser should probably commit suicide,
 		to reduce the overall recovery time.
 		--suicide-on-failure-if-primary

 Node crash
 	surviving node is Secondary,	[no delay]
 		If not DC, triggers DC election, elects itself.
 		Is DC now.
 		If stonith enabled, shoots the peer.
 		Promotes this node.
 		During promotion, fenc-peer callback
 		finds a DC, and a node_state "unreachable",
 		so sets the constraint "immediately".

 	surviving node is Primary (DC)	[delay up to timeout]
 		If stonith enabled, shoots the peer.
 		fence-peer callback finds DC, after some
 		time sees node_state "unreachable",
 		or times out while node_state is still "reachable".
 		Either way still sets the constraint.

 	surviving node is Primary (not DC) [delay up to mac(dc_timeout,timeout)]
 		fence-peer callback loops trying to contact DC.
 		eventually this node is elected DC.
 		If stonith enabled, shoots the peer.

 		Fence-peer callback either times out while no DC is available,
 		thus fails.  Make sure you chose a suitable --dc-timeout.

 		Or it finds the other node "unreachable",
 		and sets the constraint.

 Total communication loss
 	To the single node, this looks like node crash, so see above.

 	The difference is the potential of data divergence.

 	If DRBD was configured for "fencing resource-and-stonith",
 	IO on any Primary is frozen while the fence-peer callback runs.

 	If stonith is enabled, timeouts should be selected so that
 	we are shot while waiting for the DC to confirm node_state
 	"unreachable" of the peer, thus combined with freezing IO,
 	no harmful data diversion can happen at this time.

 	If there is no stonith enabled, data divergence is unavoidable.

 		==> Multi-Primary *requires*
 		    both node level fencing (stonith)
 		    AND drbd resource level fencing

 	Again: Multi-Primary REQUIRES stonith enabled and working.
	# vim: set foldenable foldmethod=indent sw=4 ts=8 :

	# Copyright 2013 Linbit HA Solutions GmbH
	# Lars Ellenberg @ linbit.com

	TODO:
	someone convert this into proper ascii doc please ;-)
	... and draw some pictures ...

	How crm-fence-peer.sh, pacemaker, and the OCF Linbit DRBD resource agent
	are supposed to work together.

	Two node cluster is the trickier one, because it has not real quorum.

	Relative Timeouts
	--dc-timeout > dead-time resp. stonith-timeout
	if stonith enabled, --timeout >= --dc-timeout
	if no stonith, then timeout may be small.

	Pacemaker operations timeouts
	monitor and promote action timeout > max(dc_timeout, timeout)

	Node reboot, possibly because of crash or stonith due to communication loss
	no peer reachable [no delay]
	crm may decide to elect itself, shoot the peer,
	and start services.

	If DRBD peer disk state is known Outdated or worse, DRBD will
	switch itself to UpToDate, allowing it to be promoted,
	without further fencing actions.

	If DRBD peer disk state is DUnknown, DRBD will be only Consistent.
	In case crm decides to promote this instance, the fence-peer callback
	runs, finds the peer "unreachable", finds itself Consistent only,
	does NOT set any constraint, and DRBD refuses to be promoted.

	CRM will now try in an endless loop to promote this instance.

	Avoid this by adding
	param adjust_master_score="0 10 1000 10000"
	to the DRBD resource definition.

	no replication link
	CRM can see both nodes. [delay: crmadmin -S $peer]

	If currently both nodes are Secondary Consistent, CRM will decide to
	promote one instance. The fence-peer callback will find the other node
	still reachable after timeout, and set the constraint.

	If there is already one Primary, and this is a node rejoining the
	cluster, there should already be a constraint preventing this node
	from being promoted.

	Only Replication link breaks during normal operation
	Single Primary [delay: crmadmin -S $peer]
	fence-peer callback finds DC,
	crmadmin -S confirms peer still "reachable",
	and sets contraint.

	Dual Primary
	both fence-peer callbacks find DC,
	both see node_state "reachable",
	optionaly delay for --network-hickup timeout,
	and if DRBD is still disconnected,
	both try to set the constraint.
	Only one succeeds.

	The loser should probably commit suicide,
	to reduce the overall recovery time.
	--suicide-on-failure-if-primary

	Node crash
	surviving node is Secondary, [no delay]
	If not DC, triggers DC election, elects itself.
	Is DC now.
	If stonith enabled, shoots the peer.
	Promotes this node.
	During promotion, fenc-peer callback
	finds a DC, and a node_state "unreachable",
	so sets the constraint "immediately".

	surviving node is Primary (DC) [delay up to timeout]
	If stonith enabled, shoots the peer.
	fence-peer callback finds DC, after some
	time sees node_state "unreachable",
	or times out while node_state is still "reachable".
	Either way still sets the constraint.

	surviving node is Primary (not DC) [delay up to mac(dc_timeout,timeout)]
	fence-peer callback loops trying to contact DC.
	eventually this node is elected DC.
	If stonith enabled, shoots the peer.

	Fence-peer callback either times out while no DC is available,
	thus fails. Make sure you chose a suitable --dc-timeout.

	Or it finds the other node "unreachable",
	and sets the constraint.

	Total communication loss
	To the single node, this looks like node crash, so see above.

	The difference is the potential of data divergence.

	If DRBD was configured for "fencing resource-and-stonith",
	IO on any Primary is frozen while the fence-peer callback runs.

	If stonith is enabled, timeouts should be selected so that
	we are shot while waiting for the DC to confirm node_state
	"unreachable" of the peer, thus combined with freezing IO,
	no harmful data diversion can happen at this time.

	If there is no stonith enabled, data divergence is unavoidable.

	==> Multi-Primary requires
	both node level fencing (stonith)
	AND drbd resource level fencing

	Again: Multi-Primary REQUIRES stonith enabled and working.