← Back to team overview

fuel-dev team mailing list archive

New RA for Galera

 

Hello guys!
I would like to start discussion on a new resource agent for galera/pacemaker.

Main features:
* Support cluster boostrap
* Support reboot any node in cluster
* Support reboot whole cluster
* To determine which node have latest DB version, we should use galera GTID (Global Transaction ID)
* Node with latest GTID is galera PC (primary component) in case of reelection
* Administrator can manually set node as PC

GTID:
* get GTID from mysqld --wsrep-recover or SQL query 'SHOW STATUS LIKE ‚wsrep_local_state_uuid''
* store GTID as crm_attribute for node (crm_attribute --node $HOSTNAME --lifetime $LIFETIME --name gtid --update $GTID)
* on every monitor/stop/start action update GTID for given node
* GTID can have 3 format:
 - XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX:123 - standard cluster-id:commit-id
 - XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX:-1 - standard non initialized cluster, 00000000-0000-0000-0000-000000000000:-1
 - XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX:INF - commit-id manually set to INF, force RA to create new cluster, with master on given node

Check if reelection of PC is needed:
* (node is located in partition with quorum OR we have only 1 node configured in cluster) AND galera resource is not running on any node
* GTID is manually set to INF on given node

Check if given node is PC:
* have highest GTID in cluster, in case we have more than one node with „highest” GTID, we use CRC32 to choose proper PC.
* GTID is manually set to INF
* in case node with highest GTID will not come back after cluster reboot (for example disk failure) administrator should set GTID to INF on other node

I have almost ready RA: http://zynzel.spof.pl/mysql-wss

Tested with vanila centos galera/pacemaker/corosync - OK
Tested with Fuel 4.1 - Fail


Fuel 4.1 with that RA will not deploy correctly, because we use crm_attribute to store GTID, and in manifest we use cs_shadow/cs_commit for every pacemaker resource.
This lead to cs_commit problem with different configuration in shadow copy and running configuration (running config changed by RA).
"Could not commit shadow instance [..] to the CIB: Application of an update diff failed”

To solve this we can go in 2 ways:
1) dont use cs_commit/cs_shadow in manifests 
2) store GTID in other way than crm_attribute

IMHO 2) is better (less invasive) and we can store GTID in corosync CMAP (http://www.polarhome.com/service/man/generic.php?qf=corosync-cmapctl), but this require corosync 2.X



Follow ups