launchpad-dev team mailing list archive

Thread
Date
Re: rabbit, where art thou?

To: Robert Collins <robertc@xxxxxxxxxxxxxxxxx>
From: Stuart Bishop <stuart.bishop@xxxxxxxxxxxxx>
Date: Tue, 14 Jun 2011 17:56:40 +0700
Cc: Elliot Murphy <elliot@xxxxxxxxxxxxx>, Launchpad Community Development Team <launchpad-dev@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <BANLkTikrD-zKAGLC93_wutD8A_VaqFH7EQ@mail.gmail.com>
Sender: stuart@xxxxxxxxxxxxxxxx
On Tue, Jun 14, 2011 at 4:54 PM, Robert Collins
<robertc@xxxxxxxxxxxxxxxxx> wrote:

> We got our original test fixture code from U1 which runs up ephemeral
> servers as part of their test suite. Unlike use they start that from
> outside the test suite. So one possibility is that its the old 'python
> tramps all over SIGPIPE' behaviour tripping us up. We run rabbit from
> within python so that each worker in parallel test mode can get its
> own rabbit and not stomp on other tests.
>
> Landscape, another internal Canonical also use Rabbit, and they use
> the system Rabbit for their test suite. This seems undesirable to me
> because it means running the test suite depends on more local system
> configuration, which makes it harder to do on datacentre machines, as
> well as being more intrusive on dev machines.

My understanding is that clients configure what they need when they
connect, if it isn't already setup. I think things will perform better
if we used a single RabbitMQ instance rather than spawn several to run
our test suite in parallel. Instead of dynamically choosing a port, we
would instead need to dynamically adjust the queue names.

If we did this, using the system Rabbit would actually mean *less*
setup, because we wouldn't have to hack our startup scripts to stop
the default system Rabbit from launching (it saves resources, and we
don't want to use it by accident if for example our port override code
fails).


>  * We're still quite a way away from having production rabbit installs
> meet all of https://dev.launchpad.net/ArchitectureGuide/ServicesRequirements

I don't think we are going to get away from SPOF for a messaging
system, although that item is somewhat optional. Live upgrades
(applying security patches) would be tricky, but I think doable, using
HAProxy. Its certainly not ideal through, but I can't think of
anything out there that would let us reinstall the .deb and restart
the service without a short outage.

> Now, when folk @ Canonical started deploying message queues, rabbitmq
> was basically 'it' - the 0mq schism came along later.

Has anyone actually looked at 0mq semi seriously yet? I thought it was
more like 'hey, there might be a suitable competing product now' more
than a schism :)

>  - should we invest $unknown_time in chasing this sporadic failure
> down to ground
>  - should we look at getting a rabbit expert to help us?
>  - should we use rabbit?
>  - and if not rabbit, what then [and what is compellingly different]?
>
> AIUI Julian has asked Gavin to stop pushing rabbit forward for now;
> that means that we're de facto not investing in fixing it at the
> moment.
>
> I don't know any deep-guru rabbit experts personally, but even if I
> did, the HA concerns really have mre questioning rabbit as our long
> term future.

What alternative do we have to a messaging system with RabbitMQ's HA
story? I think a messaging system, even with RabbitMQ's HA story, is
better to the current poll-the-central-database approach. I also think
that RabbitMQ would be preferable to rolling our own solution.

How would you design a messaging system with a better HA story? I
think we are looking at a system where messages get posted to a proxy
which sends them to multiple backends, and messages get received via a
proxy which handles locking to ensure a message only gets consumed
once. There would also need to be a process to ensure that when a
backend is restarted, messages consumed when it was down are removed.
For extra points and proper redundancy, when a backend is restarted it
needs to be populated with messages that were posted when it was down.

The alternative would be to have the storage on a multi master
database such as PostgreSQL + Bucardo, MySQL HA or Cassandra.

> So perhaps we should wait to talk with this rabbit HA expert, and if
> the resulting story is still overly icky, look closely at e.g. 0mq as
> a simpler proposition with equal HA facilities (simpler to deploy,
> simpler to admin, simpler to test with).

We need to discover if Rabbit can have a better HA story (I suspect
this will be two servers with shared disk, heatbeat and failover,
which doesn't really help our main concern of being able to apply
security patches without scheduled downtime).

Something simpler and more stable than RabbitMQ would be great,
especially if it comes with a better HA story. Steve's response to
this thread just appeared and doesn't inspire me with confidence. So
yeah, investigate 0mq.



-- 
Stuart Bishop <stuart@xxxxxxxxxxxxxxxx>
http://www.stuartbishop.net/
References

rabbit, where art thou?
From: Robert Collins, 2011-06-14