launchpad-dev team mailing list archive
-
launchpad-dev team
-
Mailing list archive
-
Message #07297
Re: micro services: HTTP authentication in the datacentre and default protocol.
On Tue, Jun 7, 2011 at 6:43 AM, Julian Edwards
<julian.edwards@xxxxxxxxxxxxx> wrote:
> On Friday 03 June 2011 03:33:13 Robert Collins wrote:
>> Rabbit has an awkward high availability story; specifically its not
>> trivial to get the reliability we have out of HTTP services, This is
>> partly because rabbit clusters don't distribute the queues and because
>> its a more stateful and complex system than HTTP. Long story short we
>> won't be in a position to use queues for persistence and its simpler
>> to use HTTP to gracefully handle a single backend node dying.
>
> This makes me sad :/ Queues are massively more useful if they are persistent
> and this is one aspect that I was really looking forward to working with.
> There's ways around it of course, but it makes things more awkward for the
> consumer.
>
> Presumably you've been looking at http://www.rabbitmq.com/pacemaker.html ?
> I've had a quick glance but not digested anything.
Indeed. Basically you run a watchdog that notes that rabbit is down
and fires up rabbit on a separate node using the same shared disk
(e.g. DRBD, OCFS2 etc) and the same node id, you do ip address
handovers .. shudder.
Its doable, but AFAIK:
- none of the Canonical deployments have this aspect live
- its susceptible to split brain fail
So I think we'd need to invest considerably more resources to get a
resilient HA rabbit. We may want to do that in the medium term, but
/many/ of our initial use cases for rabbit are primarily event
raising. So I think we can get some early benefit, and make per-case
risk assessments for use of its persistence features in the short
term.
Anecdata: twitter, who run kestrel as their queueing system simply
design their code to gracefully deal with a queue server going awol
(be that crash, boom, whatever).
-Rob
Follow ups
References