← Back to team overview

openstack team mailing list archive

Re: High availability in openstack?

 

Ed and i are on the same team and we are also working on a reaper, using the databases instance table state and notifications. If enough of the community is behind the idea we can push it upstream. Knowing what to reap can be tricky tho.

Sent from my iPhone

On Aug 18, 2011, at 8:11 PM, "Joshua Harlow" <harlowja@xxxxxxxxxxxxx<mailto:harlowja@xxxxxxxxxxxxx>> wrote:

Thanks,

It was along the lines of what I was thinking.

If messages are made persistent, which I hope is planned, or made a configuration option what would be the effects of them not being made persistent.

Right now if a message is lost, it seems the DB/other nodes are left in a bad state, is there any plan to have a “reaper” python object that will reap this bad data/instances....

On 8/18/11 4:54 PM, "Edward "koko" Konetzko" <<konetzed@xxxxxxxxxxxxxxxxx>konetzed@xxxxxxxxxxxxxxxxx<mailto:konetzed@xxxxxxxxxxxxxxxxx>> wrote:

On 08/16/2011 04:50 PM, Joshua Harlow wrote:
> Are there any good documentations on making openstack fault tolerant or
> exactly how it will handle failures?
>
> Like say the mq server dies, can another mq server take over. Similar
> with the database (mysql replication?)....
>
> Seems like having that kind of information for corporate users would be
> nice, at least a recommended “guide”.
>
> -Josh
>
>
>
> _______________________________________________
> Mailing list: <https://launchpad.net/~openstack> https://launchpad.net/~openstack
> Post to     : <openstack@xxxxxxxxxxxxxxxxxxx> openstack@xxxxxxxxxxxxxxxxxxx<mailto:openstack@xxxxxxxxxxxxxxxxxxx>
> Unsubscribe : <https://launchpad.net/~openstack> https://launchpad.net/~openstack
> More help   : <https://help.launchpad.net/ListHelp> https://help.launchpad.net/ListHelp

Josh

I have a very bare bones start of a doc on making parts of Nova HA.  The
problem is this document is no where near ready for release as I am
probably the only person who can understand it.  I will try to point you
in the right direction on things I have done that work pretty well.

Rabbitmq
<http://www.rabbitmq.com/pacemaker.html>http://www.rabbitmq.com/pacemaker.html

Right now in the version of Nova the team I am working with nothing is
marked 'persistent'. Right now in this use case if a node fails rabbitmq
moves over and all the managers reconnect with no issues but all in
flight messages are lost.  Maybe someone here can clarify on the
direction of this.  I we are using Ubuntu 10.04 and the version of
Rabbitmq in that release does not have the pacemaker scripts, I just
pulled the current package from rabbitmq.com<http://rabbitmq.com> apt repo after that the
pacemaker setup worked perfect.

MySQL
For MySQL I just did a simple setup using DRDB to replicate
/var/lib/mysql and setup corosync/pacemaker to manage all the MySQL
resources between two nodes.  Again with this situation in failover I
had no issues with clients reconnecting to the vip.

I hope this points you in the right direction, I know its not exactly
what you wanted.  Maybe next week I can clean up my documentation and
send it out to the list.

Edward Konetzko

_______________________________________________
Mailing list: <https://launchpad.net/~openstack> https://launchpad.net/~openstack
Post to     : <openstack@xxxxxxxxxxxxxxxxxxx> openstack@xxxxxxxxxxxxxxxxxxx<mailto:openstack@xxxxxxxxxxxxxxxxxxx>
Unsubscribe : <https://launchpad.net/~openstack> https://launchpad.net/~openstack
More help   : <https://help.launchpad.net/ListHelp> https://help.launchpad.net/ListHelp

_______________________________________________
Mailing list: <https://launchpad.net/~openstack> https://launchpad.net/~openstack
Post to     : <mailto:openstack@xxxxxxxxxxxxxxxxxxx> openstack@xxxxxxxxxxxxxxxxxxx<mailto:openstack@xxxxxxxxxxxxxxxxxxx>
Unsubscribe : <https://launchpad.net/~openstack> https://launchpad.net/~openstack
More help   : <https://help.launchpad.net/ListHelp> https://help.launchpad.net/ListHelp
This email may include confidential information. If you received it in error, please delete it.

References