| Background |
| ################## |
| First of all, go look at the diagram (comms.gif), read this, then |
| look at the diagram again. |
| |
| Next, some terminology... |
| Here I will use CRM to refer to the light blue section. That is, |
| the entire collection of processes/daemons/modules on a node that, |
| as a whole, manage resources in the cluster. CRMd refers to one |
| of the dark blue boxes. It is the "master subsystem" if you like. |
| Its role is to co-ordinate the actions of all the other pieces of |
| the puzzle, including those on other nodes. |
| |
| Key points from the diagram: |
| - All communications with the CRM are done with Heartbeat messages |
| routed through the CRMd. These messages contain a text |
| representation of an XML document, the schema of which is outlined |
| at the end of this document. |
| - All communications internal to the CRM (ie. between its subsystems) |
| is performed with IPC messages. Again all messages are routed via |
| the CRMd and contain the same XML documents as Heartbeat messages. |
| - All admin clients (eventually) end up sending Heartbeat messages |
| and are thus subject to existing HA client security is available. |
| - The RPC layer allows the cluster to be controled from non-member |
| hosts (subject to RPC security which is available for free). |
| - The option of synchronous or asynchronous RPC calls will be provided. |
| This will probably be in the form of a flag sent as part of the |
| function call. |
| |
| Advantages: |
| - The only source of "requests" is the CRMd which means it *never* has |
| to forward on "request" messages for any of it sub-systems. This is |
| useful for the security of the system (see security.txt). |
| - Potentially, most CRM<-->CRM communications can be replaced with RPC |
| calls. |
| - We are able to re-use existing security mechanisms (IPC, HA, RPC, |
| unix_auth via RPC) to protect the system. |
| |
| Message scenarios: |
| ################## |
| There are really only 3 messaging scenarios in this system (exluding |
| broadcast vs. point-to-point). Again this is nice as it keeps down the |
| number of "special cases". |
| |
| 1) Sub-system <--(IPC)--> CRM <--(IPC)--> Sub-system |
| 2) Sub-system <--(IPC)--> Local CRM <--(Heartbeat)--> Remote CRM <--(IPC)--> Sub-system |
| 3) Admin Client <--(Heartbeat Broadcast)--> Remote CRM <--(IPC)--> Sub-system |
| |
| Message examples: |
| ################## |
| 1.1) the DC telling the local LRM to start a resource |
| 1.2) the LRM asking the CIB about a resource |
| |
| 2.1) the DC telling a remote LRM to start a resource |
| 2.2) the DC asking (all) the CIB(s) to provide their view of the world |
| |
| 3.1) an admin request to add/remove/modify a resource |
| 3.2) an admin request to force a failover of a resource or a |
| recomputation of the resource dependencies. |
| |
| Message Notes: |
| ################## |
| Messages may be sent to the CRM from local sub-systems via IPC or from |
| other HA clients via Heartbeat. It is then the responsibility of the |
| CRM to unpack the message and pass it on to the correct sub-system. If |
| the destination sub-system is the DC and the DC is not running on the |
| current node, the message is discarded without error. |
| |
| Where the DC receives a message from another node, it will also keep |
| track of the sending host and the reference number so that it can direct |
| the replies appropriately. The exception to this is where the message |
| is from the DC. |
| |
| Messages to the DC are *always* sent as broadcast messages and the DC |
| *must always* acknowledge the message with either the results of the |
| message or a "thankyou" message. The reason for this is that the DC may |
| change or a DC election may be in progress. The implication of this is |
| that the sender should always set a timer and resend dc_messages if they |
| have not been acknowledged. The DC will be able to detect duplicates by |
| examining the destination sub-system and the reference number and we |
| will rely on HA to ensure the delivery of DC responses. |
| |
| All messages are full crm_messages. I toyed with only sending the *_request |
| or *_response piece of the message to and/or from the relevant sub-systems, |
| but it just got messy. This way, the routing role of the CRM is much easier. |
| And easier equals lower complexity, which means less bugs, which is good for |
| everyone. |
| |
| Schema Notes: |
| ################### |
| |
| Key Attributes |
| =============== |
| |
| reference: provides the ability to track which request a responce |
| is in relation to and where the local CRM should send it. |
| *_filter: allow the operation to be limited to a particular type, |
| id and/or priority |
| timeout: allows the receiver to know how long the sender is |
| expecting the task to take so we can act and report back |
| accordingly. |
| |
| |
| Attribute values |
| ================= |
| Where the list ends with |... , the complete list of possibilities will be |
| fleshed out at a later date. |
| |
| Message Schema: |
| ################### |
| |
| <!ELEMENT crm_message (options, data?)> |
| <!ATTLIST crm_message |
| version #CDATA '1' |
| message_type (none|request|response) 'none' |
| sys_from (none|crmd|cib|lrm|admin) 'none' |
| sys_to (none|crmd|cib|lrm|admin) 'none' |
| host_from #CDATA |
| host_to #CDATA |
| reference #CDATA |
| timestamp #CDATA '0'> |
| |
| |
| <!ELEMENT options> |
| <!ATTLIST options |
| operation #CDATA |
| result? (ok|failed|...) 'ok' |
| verbose? (true|false) 'false' |
| timeout? #CDATA '0' |
| filter_priority? #CDATA <!-- might be useful later --> |
| filter_type? #CDATA |
| filter_id? #CDATA> |
| |
| <!-- data is one of ping_item, cib_fragment, lrm_status --> |
| |
| <!ELEMENT ping_item> |
| <!ATTLIST ping_item |
| crm_subsystem (none|crmd|dc|cib|lrm) 'none' |
| ping_status (error|timeout|stopped|running|sick) 'timeout'> |
| |
| <!ELEMENT lrm_status (resource_info)*> |
| <!ATTLIST lrm_status> |
| |
| <!ELEMENT resource_info> |
| <!ATTLIST resource_info |
| res_id #CDATA |
| last_op (noop|start|stop|restart) 'noop' |
| last_op_result (fail|pass|unknown|...) 'unknown' |
| status #CDATA> |
| |
| |
| <!-- always describe which part of the cib are being returned --> |
| <!ELEMENT cib_fragment (cib, obj_failed?)> |
| <!ATTLIST cib_fragment |
| cib_section (none|all|nodes|resources|constraints|status) 'none'> |
| |
| <!ELEMENT obj_failed (failed_update)*> |
| <!ATTLIST obj_failed> |
| |
| <!ELEMENT failed_update> |
| <!ATTLIST failed_update |
| id #CDATA |
| object_type (none|node|resource|constraint|state) 'none' |
| operation (none|add|update|delete|replace) 'none' |
| reason? (unknown|...) 'unknown'> |
| |