← Back to team overview

launchpad-dev team mailing list archive

Re: micro services: HTTP authentication in the datacentre and default protocol.

 

On Tue, Jun 7, 2011 at 6:43 AM, Julian Edwards
<julian.edwards@xxxxxxxxxxxxx> wrote:
> On Friday 03 June 2011 03:33:13 Robert Collins wrote:
>> Rabbit has an awkward high availability story; specifically its not
>> trivial to get the reliability we have out of HTTP services, This is
>> partly because rabbit clusters don't distribute the queues and because
>> its a more stateful and complex system than HTTP. Long story short we
>> won't be in a position to use queues for persistence and its simpler
>> to use HTTP to gracefully handle a single backend node dying.
>
> This makes me sad :/  Queues are massively more useful if they are persistent
> and this is one aspect that I was really looking forward to working with.
> There's ways around it of course, but it makes things more awkward for the
> consumer.
>
> Presumably you've been looking at http://www.rabbitmq.com/pacemaker.html ?
> I've had a quick glance but not digested anything.

Indeed. Basically you run a watchdog that notes that rabbit is down
and fires up rabbit on a separate node using the same shared disk
(e.g. DRBD, OCFS2 etc) and the same node id, you do ip address
handovers .. shudder.

Its doable, but AFAIK:
 - none of the Canonical deployments have this aspect live
 - its susceptible to split brain fail

So I think we'd need to invest considerably more resources to get a
resilient HA rabbit. We may want to do that in the medium term, but
/many/ of our initial use cases for rabbit are primarily event
raising. So I think we can get some early benefit, and make per-case
risk assessments for use of its persistence features in the short
term.

Anecdata: twitter, who run kestrel as their queueing system simply
design their code to gracefully deal with a queue server going awol
(be that crash, boom, whatever).

-Rob


Follow ups

References