← Back to team overview

launchpad-dev team mailing list archive

rabbit, where art thou?

 

So, it turns out that the rabbit test fixture was more trouble than we thought.

We've now tried three, or is it four times to get it to be part of our
test suite.

Currently it sporadically fails to startup inside the test suite - the
erlang OTP decides it doesn't want to play.

We got our original test fixture code from U1 which runs up ephemeral
servers as part of their test suite. Unlike use they start that from
outside the test suite. So one possibility is that its the old 'python
tramps all over SIGPIPE' behaviour tripping us up. We run rabbit from
within python so that each worker in parallel test mode can get its
own rabbit and not stomp on other tests.

Landscape, another internal Canonical also use Rabbit, and they use
the system Rabbit for their test suite. This seems undesirable to me
because it means running the test suite depends on more local system
configuration, which makes it harder to do on datacentre machines, as
well as being more intrusive on dev machines.

In a separate but related matter Elliot has promised to put me in
touch with one of the Rabbit core devs that knows all about HA :
apparently it is better than the docs say it is :).

Anyhow, the current state is this:
 * rabbit is currently out of our tree
 * We've an unknown amount of work to do to get it working in tests reliably
 * We're still quite a way away from having production rabbit installs
meet all of https://dev.launchpad.net/ArchitectureGuide/ServicesRequirements

Now, its not really a Launchpad issue, but within Canonical we try
quite hard to use the same infrastructure across different projects,
so that skills and knowledge are transferable.

That means that if we want to use a different mq to rabbit we need a
reasonably compelling reason behind that: ideally one which other
projects would agree with and eventually migrate.

Now, when folk @ Canonical started deploying message queues, rabbitmq
was basically 'it' - the 0mq schism came along later.

As I see it we have a few questions to answer:
 - should we invest $unknown_time in chasing this sporadic failure
down to ground
 - should we look at getting a rabbit expert to help us?
 - should we use rabbit?
 - and if not rabbit, what then [and what is compellingly different]?

AIUI Julian has asked Gavin to stop pushing rabbit forward for now;
that means that we're de facto not investing in fixing it at the
moment.

I don't know any deep-guru rabbit experts personally, but even if I
did, the HA concerns really have mre questioning rabbit as our long
term future.

So perhaps we should wait to talk with this rabbit HA expert, and if
the resulting story is still overly icky, look closely at e.g. 0mq as
a simpler proposition with equal HA facilities (simpler to deploy,
simpler to admin, simpler to test with).

-Rob


Follow ups