fuel-dev team mailing list archive
-
fuel-dev team
-
Mailing list archive
-
Message #01113
Re: [Openstack-dev] New RA for Galera
may be the problem is that you are using liftetime crm attributes instead
of 'reboot' ones. shadow/commit is used by us because we need transactional
behaviour in some cases. if you turn crm_shadow off, then you will
experience problems with multi-state resources and
location/colocation/order constraints. so we need to find a way to make
commits transactional. there are two ways:
1) rewrite corosync providers to use crm_diff command and apply it instead
of shadow commit that can swallow cluster attributes sometimes
2) store 'reboot' attributes instead of lifetime ones
On Thu, May 29, 2014 at 12:42 PM, Bogdan Dobrelya <bdobrelia@xxxxxxxxxxxx>wrote:
> On 05/27/14 16:44, Bartosz Kupidura wrote:
> > Hello,
> > Responses inline.
> >
> >
> > Wiadomość napisana przez Vladimir Kuklin <vkuklin@xxxxxxxxxxxx> w dniu
> 27 maj 2014, o godz. 15:12:
> >
> >> Hi, Bartosz
> >>
> >> First of all, we are using openstack-dev for such discussions.
> >>
> >> Second, there is also Percona's RA for Percona XtraDB Cluster, which
> looks like pretty similar, although it is written in Perl. May be we could
> derive something useful from it.
> >>
> >> Next, if you are working on this stuff, let's make it as open for the
> community as possible. There is a blueprint for Galera OCF script:
> https://blueprints.launchpad.net/fuel/+spec/reliable-galera-ocf-script.
> It would be awesome if you wrote down the specification and sent newer
> galera ocf code change request to fuel-library gerrit.
> >
> > Sure, I will update this blueprint.
> > Change request in fuel-library: https://review.openstack.org/#/c/95764/
>
> That is a really nice catch, Bartosz, thank you. I believe we should
> review the new OCF script thoroughly and consider omitting
> cs_commits/cs_shadows as well. What would be the downsides?
>
> >
> >>
> >> Speaking of crm_attribute stuff. I am very surprised that you are
> saying that node attributes are altered by crm shadow commit. We are using
> similar approach in our scripts and have never faced this issue.
> >
> > This is probably because you update crm_attribute very rarely. And with
> my approach GTID attribute is updated every 60s on every node (3 updates in
> 60s, in standard HA setup).
> >
> > You can try to update any attribute in loop during deploying cluster to
> trigger fail with corosync diff.
>
> It sounds reasonable and we should verify it.
> I've updated the statuses for related bugs and attached them to the
> aforementioned blueprint as well:
> https://bugs.launchpad.net/fuel/+bug/1283062/comments/7
> https://bugs.launchpad.net/fuel/+bug/1281592/comments/6
>
>
> >
> >>
> >> Corosync 2.x support is in our roadmap, but we are not sure that we
> will use Corosync 2.x earlier than 6.x release series start.
> >
> > Yeah, moreover corosync CMAP is not synced between cluster nodes (or
> maybe im doing something wrong?). So we need other solution for this...
> >
>
> We should use CMAN for Corosync 1.x, perhaps.
>
> >>
> >>
> >> On Tue, May 27, 2014 at 3:08 PM, Bartosz Kupidura <
> bkupidura@xxxxxxxxxxxx> wrote:
> >> Hello guys!
> >> I would like to start discussion on a new resource agent for
> galera/pacemaker.
> >>
> >> Main features:
> >> * Support cluster boostrap
> >> * Support reboot any node in cluster
> >> * Support reboot whole cluster
> >> * To determine which node have latest DB version, we should use galera
> GTID (Global Transaction ID)
> >> * Node with latest GTID is galera PC (primary component) in case of
> reelection
> >> * Administrator can manually set node as PC
> >>
> >> GTID:
> >> * get GTID from mysqld --wsrep-recover or SQL query 'SHOW STATUS LIKE
> ‚wsrep_local_state_uuid''
> >> * store GTID as crm_attribute for node (crm_attribute --node $HOSTNAME
> --lifetime $LIFETIME --name gtid --update $GTID)
> >> * on every monitor/stop/start action update GTID for given node
> >> * GTID can have 3 format:
> >> - XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX:123 - standard
> cluster-id:commit-id
> >> - XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX:-1 - standard non initialized
> cluster, 00000000-0000-0000-0000-000000000000:-1
> >> - XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX:INF - commit-id manually set to
> INF, force RA to create new cluster, with master on given node
> >>
> >> Check if reelection of PC is needed:
> >> * (node is located in partition with quorum OR we have only 1 node
> configured in cluster) AND galera resource is not running on any node
> >> * GTID is manually set to INF on given node
> >>
> >> Check if given node is PC:
> >> * have highest GTID in cluster, in case we have more than one node with
> „highest” GTID, we use CRC32 to choose proper PC.
> >> * GTID is manually set to INF
> >> * in case node with highest GTID will not come back after cluster
> reboot (for example disk failure) administrator should set GTID to INF on
> other node
> >>
> >> I have almost ready RA: http://zynzel.spof.pl/mysql-wss
> >>
> >> Tested with vanila centos galera/pacemaker/corosync - OK
> >> Tested with Fuel 4.1 - Fail
> >>
> >>
> >> Fuel 4.1 with that RA will not deploy correctly, because we use
> crm_attribute to store GTID, and in manifest we use cs_shadow/cs_commit for
> every pacemaker resource.
> >> This lead to cs_commit problem with different configuration in shadow
> copy and running configuration (running config changed by RA).
> >> "Could not commit shadow instance [..] to the CIB: Application of an
> update diff failed”
> >>
> >> To solve this we can go in 2 ways:
> >> 1) dont use cs_commit/cs_shadow in manifests
> >> 2) store GTID in other way than crm_attribute
> >>
> >> IMHO 2) is better (less invasive) and we can store GTID in corosync
> CMAP (http://www.polarhome.com/service/man/generic.php?qf=corosync-cmapctl),
> but this require corosync 2.X
> >>
> >>
> >> --
> >> Mailing list: https://launchpad.net/~fuel-dev
> >> Post to : fuel-dev@xxxxxxxxxxxxxxxxxxx
> >> Unsubscribe : https://launchpad.net/~fuel-dev
> >> More help : https://help.launchpad.net/ListHelp
> >>
> >>
> >>
> >> --
> >> Yours Faithfully,
> >> Vladimir Kuklin,
> >> Fuel Library Tech Lead,
> >> Mirantis, Inc.
> >> +7 (495) 640-49-04
> >> +7 (926) 702-39-68
> >> Skype kuklinvv
> >> 45bk3, Vorontsovskaya Str.
> >> Moscow, Russia,
> >> www.mirantis.com
> >> www.mirantis.ru
> >> vkuklin@xxxxxxxxxxxx
> >
> >
>
>
> --
> Best regards,
> Bogdan Dobrelya,
> Skype #bogdando_at_yahoo.com
> Irc #bogdando
>
--
Yours Faithfully,
Vladimir Kuklin,
Fuel Library Tech Lead,
Mirantis, Inc.
+7 (495) 640-49-04
+7 (926) 702-39-68
Skype kuklinvv
45bk3, Vorontsovskaya Str.
Moscow, Russia,
www.mirantis.com <http://www.mirantis.ru/>
www.mirantis.ru
vkuklin@xxxxxxxxxxxx
Follow ups
References