blob: f07c6aef9a9c683cc4afe178cc82e439656a91fd [file] [log] [blame]
Background
##################
First of all, go look at the diagram (comms.gif), read this, then
look at the diagram again.
Next, some terminology...
Here I will use CRM to refer to the light blue section. That is,
the entire collection of processes/daemons/modules on a node that,
as a whole, manage resources in the cluster. CRMd refers to one
of the dark blue boxes. It is the "master subsystem" if you like.
Its role is to co-ordinate the actions of all the other pieces of
the puzzle, including those on other nodes.
Key points from the diagram:
- All communications with the CRM are done with Heartbeat messages
routed through the CRMd. These messages contain a text
representation of an XML document, the schema of which is outlined
at the end of this document.
- All communications internal to the CRM (ie. between its subsystems)
is performed with IPC messages. Again all messages are routed via
the CRMd and contain the same XML documents as Heartbeat messages.
- All admin clients (eventually) end up sending Heartbeat messages
and are thus subject to existing HA client security is available.
- The RPC layer allows the cluster to be controled from non-member
hosts (subject to RPC security which is available for free).
- The option of synchronous or asynchronous RPC calls will be provided.
This will probably be in the form of a flag sent as part of the
function call.
Advantages:
- The only source of "requests" is the CRMd which means it *never* has
to forward on "request" messages for any of it sub-systems. This is
useful for the security of the system (see security.txt).
- Potentially, most CRM<-->CRM communications can be replaced with RPC
calls.
- We are able to re-use existing security mechanisms (IPC, HA, RPC,
unix_auth via RPC) to protect the system.
Message scenarios:
##################
There are really only 3 messaging scenarios in this system (exluding
broadcast vs. point-to-point). Again this is nice as it keeps down the
number of "special cases".
1) Sub-system <--(IPC)--> CRM <--(IPC)--> Sub-system
2) Sub-system <--(IPC)--> Local CRM <--(Heartbeat)--> Remote CRM <--(IPC)--> Sub-system
3) Admin Client <--(Heartbeat Broadcast)--> Remote CRM <--(IPC)--> Sub-system
Message examples:
##################
1.1) the DC telling the local LRM to start a resource
1.2) the LRM asking the CIB about a resource
2.1) the DC telling a remote LRM to start a resource
2.2) the DC asking (all) the CIB(s) to provide their view of the world
3.1) an admin request to add/remove/modify a resource
3.2) an admin request to force a failover of a resource or a
recomputation of the resource dependencies.
Message Notes:
##################
Messages may be sent to the CRM from local sub-systems via IPC or from
other HA clients via Heartbeat. It is then the responsibility of the
CRM to unpack the message and pass it on to the correct sub-system. If
the destination sub-system is the DC and the DC is not running on the
current node, the message is discarded without error.
Where the DC receives a message from another node, it will also keep
track of the sending host and the reference number so that it can direct
the replies appropriately. The exception to this is where the message
is from the DC.
Messages to the DC are *always* sent as broadcast messages and the DC
*must always* acknowledge the message with either the results of the
message or a "thankyou" message. The reason for this is that the DC may
change or a DC election may be in progress. The implication of this is
that the sender should always set a timer and resend dc_messages if they
have not been acknowledged. The DC will be able to detect duplicates by
examining the destination sub-system and the reference number and we
will rely on HA to ensure the delivery of DC responses.
All messages are full crm_messages. I toyed with only sending the *_request
or *_response piece of the message to and/or from the relevant sub-systems,
but it just got messy. This way, the routing role of the CRM is much easier.
And easier equals lower complexity, which means less bugs, which is good for
everyone.
Schema Notes:
###################
Key Attributes
===============
reference: provides the ability to track which request a responce
is in relation to and where the local CRM should send it.
*_filter: allow the operation to be limited to a particular type,
id and/or priority
timeout: allows the receiver to know how long the sender is
expecting the task to take so we can act and report back
accordingly.
Attribute values
=================
Where the list ends with |... , the complete list of possibilities will be
fleshed out at a later date.
Message Schema:
###################
<!ELEMENT crm_message (options, data?)>
<!ATTLIST crm_message
version #CDATA '1'
message_type (none|request|response) 'none'
sys_from (none|crmd|cib|lrm|admin) 'none'
sys_to (none|crmd|cib|lrm|admin) 'none'
host_from #CDATA
host_to #CDATA
reference #CDATA
timestamp #CDATA '0'>
<!ELEMENT options>
<!ATTLIST options
operation #CDATA
result? (ok|failed|...) 'ok'
verbose? (true|false) 'false'
timeout? #CDATA '0'
filter_priority? #CDATA <!-- might be useful later -->
filter_type? #CDATA
filter_id? #CDATA>
<!-- data is one of ping_item, cib_fragment, lrm_status -->
<!ELEMENT ping_item>
<!ATTLIST ping_item
crm_subsystem (none|crmd|dc|cib|lrm) 'none'
ping_status (error|timeout|stopped|running|sick) 'timeout'>
<!ELEMENT lrm_status (resource_info)*>
<!ATTLIST lrm_status>
<!ELEMENT resource_info>
<!ATTLIST resource_info
res_id #CDATA
last_op (noop|start|stop|restart) 'noop'
last_op_result (fail|pass|unknown|...) 'unknown'
status #CDATA>
<!-- always describe which part of the cib are being returned -->
<!ELEMENT cib_fragment (cib, obj_failed?)>
<!ATTLIST cib_fragment
cib_section (none|all|nodes|resources|constraints|status) 'none'>
<!ELEMENT obj_failed (failed_update)*>
<!ATTLIST obj_failed>
<!ELEMENT failed_update>
<!ATTLIST failed_update
id #CDATA
object_type (none|node|resource|constraint|state) 'none'
operation (none|add|update|delete|replace) 'none'
reason? (unknown|...) 'unknown'>