pacemaker/doc/msg-schema.txt - actifio - Git at Google

 Background
 ##################
 First of all, go look at the diagram (comms.gif), read this, then
 look at the diagram again.

 Next, some terminology...
 Here I will use CRM to refer to the light blue section.  That is,
 the entire collection of processes/daemons/modules on a node that,
 as a whole, manage resources in the cluster.  CRMd refers to one
 of the dark blue boxes.  It is the "master subsystem" if you like.
 Its role is to co-ordinate the actions of all the other pieces of
 the puzzle, including those on other nodes.

 Key points from the diagram:
 - All communications with the CRM are done with Heartbeat messages
   routed through the CRMd.  These messages contain a text
   representation of an XML document, the schema of which is outlined
   at the end of this document.
 - All communications internal to the CRM (ie. between its subsystems)
   is performed with IPC messages.  Again all messages are routed via
   the CRMd and contain the same XML documents as Heartbeat messages.
 - All admin clients (eventually) end up sending Heartbeat messages
   and are thus subject to existing HA client security is available.
 - The RPC layer allows the cluster to be controled from non-member
   hosts (subject to RPC security which is available for free).
 - The option of synchronous or asynchronous RPC calls will be provided.
   This will probably be in the form of a flag sent as part of the
   function call.

 Advantages:
 - The only source of "requests" is the CRMd which means it *never* has
   to forward on "request" messages for any of it sub-systems.  This is
   useful for the security of the system (see security.txt).
 - Potentially, most CRM<-->CRM communications can be replaced with RPC
   calls.
 - We are able to re-use existing security mechanisms (IPC, HA, RPC,
   unix_auth via RPC) to protect the system.

 Message scenarios:
 ##################
 There are really only 3 messaging scenarios in this system (exluding
 broadcast vs. point-to-point).  Again this is nice as it keeps down the
 number of "special cases".

 1) Sub-system <--(IPC)--> CRM <--(IPC)--> Sub-system
 2) Sub-system <--(IPC)--> Local CRM <--(Heartbeat)--> Remote CRM <--(IPC)--> Sub-system
 3) Admin Client <--(Heartbeat Broadcast)--> Remote CRM <--(IPC)--> Sub-system

 Message examples:
 ##################
 1.1) the DC telling the local LRM to start a resource
 1.2) the LRM asking the CIB about a resource

 2.1) the DC telling a remote LRM to start a resource
 2.2) the DC asking (all) the CIB(s) to provide their view of the world

 3.1) an admin request to add/remove/modify a resource
 3.2) an admin request to force a failover of a resource or a
 recomputation of the resource dependencies.

 Message Notes:
 ##################
 Messages may be sent to the CRM from local sub-systems via IPC or from
 other HA clients via Heartbeat.  It is then the responsibility of the
 CRM to unpack the message and pass it on to the correct sub-system.  If
 the destination sub-system is the DC and the DC is not running on the
 current node, the message is discarded without error.

 Where the DC receives a message from another node, it will also keep
 track of the sending host and the reference number so that it can direct
 the replies appropriately.  The exception to this is where the message
 is from the DC.

 Messages to the DC are *always* sent as broadcast messages and the DC
 *must always* acknowledge the message with either the results of the
 message or a "thankyou" message.  The reason for this is that the DC may
 change or a DC election may be in progress.  The implication of this is
 that the sender should always set a timer and resend dc_messages if they
 have not been acknowledged.  The DC will be able to detect duplicates by
 examining the destination sub-system and the reference number and we
 will rely on HA to ensure the delivery of DC responses.

 All messages are full crm_messages.  I toyed with only sending the *_request
 or *_response piece of the message to and/or from the relevant sub-systems,
 but it just got messy.  This way, the routing role of the CRM is much easier.
 And easier equals lower complexity, which means less bugs, which is good for
 everyone.

 Schema Notes:
 ###################

 Key Attributes
 ===============

 reference:	provides the ability to track which request a responce
 		is in relation to and where the local CRM should send it.
 *_filter:	allow the operation to be limited to a particular type,
 		id and/or priority
 timeout:	allows the receiver to know how long the sender is
 		expecting the task to take so we can act and report back
 		accordingly.


 Attribute values
 =================
 Where the list ends with |... , the complete list of possibilities will be
 fleshed out at a later date.

 Message Schema:
 ###################

 <!ELEMENT crm_message (options, data?)>
 <!ATTLIST crm_message
           version          #CDATA                       '1'
 	  message_type	   (none|request|response)	'none'
           sys_from         (none|crmd|cib|lrm|admin)	'none'
           sys_to           (none|crmd|cib|lrm|admin)	'none'
 	  host_from	   #CDATA
 	  host_to	   #CDATA
 	  reference	   #CDATA
           timestamp        #CDATA                       '0'>


 <!ELEMENT options>
 <!ATTLIST options
 	  operation		#CDATA
 	  result?		(ok|failed|...)		'ok'
           verbose?		(true|false)		'false'
           timeout?		#CDATA			'0'
 	  filter_priority?	#CDATA	<!-- might be useful later -->
           filter_type?		#CDATA
 	  filter_id?		#CDATA>

 <!-- data is one of ping_item, cib_fragment, lrm_status -->

 <!ELEMENT ping_item>
 <!ATTLIST ping_item
 	  crm_subsystem    (none|crmd|dc|cib|lrm)		     'none'
 	  ping_status	   (error|timeout|stopped|running|sick)	     'timeout'>

 <!ELEMENT lrm_status (resource_info)*>
 <!ATTLIST lrm_status>

 <!ELEMENT resource_info>
 <!ATTLIST resource_info
 	  res_id		#CDATA
           last_op		(noop|start|stop|restart)	     'noop'
           last_op_result        (fail|pass|unknown|...)		     'unknown'
 	  status		#CDATA>


 <!-- always describe which part of the cib are being returned -->
 <!ELEMENT cib_fragment (cib, obj_failed?)>
 <!ATTLIST cib_fragment
           cib_section  (none|all|nodes|resources|constraints|status) 'none'>

 <!ELEMENT obj_failed (failed_update)*>
 <!ATTLIST obj_failed>

 <!ELEMENT failed_update>
 <!ATTLIST failed_update
 	  id			#CDATA
           object_type		(none|node|resource|constraint|state) 'none'
           operation		(none|add|update|delete|replace)      'none'
 	  reason?		(unknown|...)			      'unknown'>
	Background
	##################
	First of all, go look at the diagram (comms.gif), read this, then
	look at the diagram again.

	Next, some terminology...
	Here I will use CRM to refer to the light blue section. That is,
	the entire collection of processes/daemons/modules on a node that,
	as a whole, manage resources in the cluster. CRMd refers to one
	of the dark blue boxes. It is the "master subsystem" if you like.
	Its role is to co-ordinate the actions of all the other pieces of
	the puzzle, including those on other nodes.

	Key points from the diagram:
	- All communications with the CRM are done with Heartbeat messages
	routed through the CRMd. These messages contain a text
	representation of an XML document, the schema of which is outlined
	at the end of this document.
	- All communications internal to the CRM (ie. between its subsystems)
	is performed with IPC messages. Again all messages are routed via
	the CRMd and contain the same XML documents as Heartbeat messages.
	- All admin clients (eventually) end up sending Heartbeat messages
	and are thus subject to existing HA client security is available.
	- The RPC layer allows the cluster to be controled from non-member
	hosts (subject to RPC security which is available for free).
	- The option of synchronous or asynchronous RPC calls will be provided.
	This will probably be in the form of a flag sent as part of the
	function call.

	Advantages:
	- The only source of "requests" is the CRMd which means it never has
	to forward on "request" messages for any of it sub-systems. This is
	useful for the security of the system (see security.txt).
	- Potentially, most CRM<-->CRM communications can be replaced with RPC
	calls.
	- We are able to re-use existing security mechanisms (IPC, HA, RPC,
	unix_auth via RPC) to protect the system.

	Message scenarios:
	##################
	There are really only 3 messaging scenarios in this system (exluding
	broadcast vs. point-to-point). Again this is nice as it keeps down the
	number of "special cases".

	1) Sub-system <--(IPC)--> CRM <--(IPC)--> Sub-system
	2) Sub-system <--(IPC)--> Local CRM <--(Heartbeat)--> Remote CRM <--(IPC)--> Sub-system
	3) Admin Client <--(Heartbeat Broadcast)--> Remote CRM <--(IPC)--> Sub-system

	Message examples:
	##################
	1.1) the DC telling the local LRM to start a resource
	1.2) the LRM asking the CIB about a resource

	2.1) the DC telling a remote LRM to start a resource
	2.2) the DC asking (all) the CIB(s) to provide their view of the world

	3.1) an admin request to add/remove/modify a resource
	3.2) an admin request to force a failover of a resource or a
	recomputation of the resource dependencies.

	Message Notes:
	##################
	Messages may be sent to the CRM from local sub-systems via IPC or from
	other HA clients via Heartbeat. It is then the responsibility of the
	CRM to unpack the message and pass it on to the correct sub-system. If
	the destination sub-system is the DC and the DC is not running on the
	current node, the message is discarded without error.

	Where the DC receives a message from another node, it will also keep
	track of the sending host and the reference number so that it can direct
	the replies appropriately. The exception to this is where the message
	is from the DC.

	Messages to the DC are always sent as broadcast messages and the DC
	must always acknowledge the message with either the results of the
	message or a "thankyou" message. The reason for this is that the DC may
	change or a DC election may be in progress. The implication of this is
	that the sender should always set a timer and resend dc_messages if they
	have not been acknowledged. The DC will be able to detect duplicates by
	examining the destination sub-system and the reference number and we
	will rely on HA to ensure the delivery of DC responses.

	All messages are full crm_messages. I toyed with only sending the *_request
	or *_response piece of the message to and/or from the relevant sub-systems,
	but it just got messy. This way, the routing role of the CRM is much easier.
	And easier equals lower complexity, which means less bugs, which is good for
	everyone.

	Schema Notes:
	###################

	Key Attributes
	===============

	reference: provides the ability to track which request a responce
	is in relation to and where the local CRM should send it.
	*_filter: allow the operation to be limited to a particular type,
	id and/or priority
	timeout: allows the receiver to know how long the sender is
	expecting the task to take so we can act and report back
	accordingly.


	Attribute values
	=================
	Where the list ends with \|... , the complete list of possibilities will be
	fleshed out at a later date.

	Message Schema:
	###################

	<!ELEMENT crm_message (options, data?)>
	<!ATTLIST crm_message
	version #CDATA '1'
	message_type (none\|request\|response) 'none'
	sys_from (none\|crmd\|cib\|lrm\|admin) 'none'
	sys_to (none\|crmd\|cib\|lrm\|admin) 'none'
	host_from #CDATA
	host_to #CDATA
	reference #CDATA
	timestamp #CDATA '0'>


	<!ELEMENT options>
	<!ATTLIST options
	operation #CDATA
	result? (ok\|failed\|...) 'ok'
	verbose? (true\|false) 'false'
	timeout? #CDATA '0'
	filter_priority? #CDATA <!-- might be useful later -->
	filter_type? #CDATA
	filter_id? #CDATA>

	<!-- data is one of ping_item, cib_fragment, lrm_status -->

	<!ELEMENT ping_item>
	<!ATTLIST ping_item
	crm_subsystem (none\|crmd\|dc\|cib\|lrm) 'none'
	ping_status (error\|timeout\|stopped\|running\|sick) 'timeout'>

	<!ELEMENT lrm_status (resource_info)*>
	<!ATTLIST lrm_status>

	<!ELEMENT resource_info>
	<!ATTLIST resource_info
	res_id #CDATA
	last_op (noop\|start\|stop\|restart) 'noop'
	last_op_result (fail\|pass\|unknown\|...) 'unknown'
	status #CDATA>


	<!-- always describe which part of the cib are being returned -->
	<!ELEMENT cib_fragment (cib, obj_failed?)>
	<!ATTLIST cib_fragment
	cib_section (none\|all\|nodes\|resources\|constraints\|status) 'none'>

	<!ELEMENT obj_failed (failed_update)*>
	<!ATTLIST obj_failed>

	<!ELEMENT failed_update>
	<!ATTLIST failed_update
	id #CDATA
	object_type (none\|node\|resource\|constraint\|state) 'none'
	operation (none\|add\|update\|delete\|replace) 'none'
	reason? (unknown\|...) 'unknown'>