launchpad-dev team mailing list archive

Thread
Date

Re: micro services: HTTP authentication in the datacentre and default protocol.

To: John Arbash Meinel <john@xxxxxxxxxxxxxxxxx>
From: Robert Collins <robertc@xxxxxxxxxxxxxxxxx>
Date: Wed, 8 Jun 2011 09:14:03 +1200
Cc: Launchpad Community Development Team <launchpad-dev@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <4DEE08EA.5040406@arbash-meinel.com>

On Tue, Jun 7, 2011 at 11:18 PM, John Arbash Meinel
<john@xxxxxxxxxxxxxxxxx> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
> ...
>> Its doable, but AFAIK:
>>  - none of the Canonical deployments have this aspect live
>>  - its susceptible to split brain fail
>>
>> So I think we'd need to invest considerably more resources to get a
>> resilient HA rabbit. We may want to do that in the medium term, but
>> /many/ of our initial use cases for rabbit are primarily event
>> raising. So I think we can get some early benefit, and make per-case
>> risk assessments for use of its persistence features in the short
>> term.
>>
>> Anecdata: twitter, who run kestrel as their queueing system simply
>> design their code to gracefully deal with a queue server going awol
>> (be that crash, boom, whatever).
>>
>> -Rob
>
> How much of HA is because you expect Rabbit to die, and how much of HA
> is because you want a way to deploy without taking down the whole
> system? Clustering seems like it would handle the second case. One
> node's queue is temporarily offline until it is brought back up, but the
> other nodes keep serving. And if you stop accepting new entries while
> you are shutting down, then you never have any messages delayed.

Rabbit does not run active-active ever. So you can't keep serving
while one node is down: you have to fail over, which means degraded
service (at best) during the failover process (several seconds at
least from what I can tell).

> If it is that you want to plan for Rabbit (or the machine it is running
> on) to fail non-deterministically, then certainly you need different
> security guarantees.
>
> However, isn't the current Postgres master a "if it goes down we all go
> down for a while" setup? Isn't that machine pretty reliable overall? (It
> certainly also suffers from "we can't softly shut-down for upgrades",
> but it seems like the non-deterministic failures are pretty reasonable.)

I would like to fix the postgresql one too; at the moment the way we
work with it - due to its design around clustering and schema changes
- is to change things once a month, which drives latency for feature
work and performance work - we're *just now* landing a change we could
have had out there for 3 weeks, if we didn't have a 4 week cycle.

Postgresql having defects in this area isn't a reason to bring in
other like defects in new components :)

-Rob

Follow ups

Re: micro services: HTTP authentication in the datacentre and default protocol.
From: John Arbash Meinel, 2011-06-08

References

micro services: HTTP authentication in the datacentre and default protocol.
From: Robert Collins, 2011-06-03
Re: micro services: HTTP authentication in the datacentre and default protocol.
From: Julian Edwards, 2011-06-06
Re: micro services: HTTP authentication in the datacentre and default protocol.
From: Robert Collins, 2011-06-06
Re: micro services: HTTP authentication in the datacentre and default protocol.
From: John Arbash Meinel, 2011-06-07