← Back to team overview

launchpad-dev team mailing list archive

Re: generic job queue cronjob?

 

On Mon, Oct 4, 2010 at 10:35 PM, Robert Collins
<robert.collins@xxxxxxxxxxxxx> wrote:
> Rabbit isn't reliable - specifically by default it doesn't fsync
> before acknowledging, and it has no HA story to deal with server
> crashes/power failures.

This caught my attention, and didn't quite jive with what I thought I
was told by the RabbitMQ devs, so I double checked (first rule of
talking to lifeless: always double-check, he is probably right!). So
the story is that RabbitMQ is perfectly durable if you mark the queue
as persistent and mark your message as persistent, which is very
simple to do. For crash recovery, if you are running in persistent
mode and the server is unplugged, you simply bring it back online and
no messages are lost, which is every bit as good as our current
postgres story. I don't think launchpad needs more than that, but just
in case there does appear to be an Active/Passive HA solution for
Rabbit described on their site. Here is the raw info from Matthias,
the lead dev for RabbitMQ:

"by default it doesn't fsync before acknowledging" is about right. In
fact there is no acknowledging to publishers *at all*.

But... just switch the AMQP channel into transactional mode and
publish persistent messages to durable queues, and you get the
guarantee that every published message will have been written to disk
by the time the tx.commit-ok arrives.

Needless to say that is quite expensive, which is why it is not the
default mode of operation. But there are many happy rabbit users who
running systems on that basis. And we have some protocol extensions in
the pipeline that improve performance significantly by providing
streaming publisher acks outside the context of transactions.

As for HA... see http://www.rabbitmq.com/pacemaker.html Again, there
are quite a few rabbit users who use that or a similar setup.

-- 
Elliot Murphy | https://launchpad.net/~statik/



Follow ups

References