← Back to team overview

launchpad-dev team mailing list archive

Re: performance tuesday - the rabbit has landed

 

On Sat, May 14, 2011 at 3:06 PM, Jeroen Vermeulen <jtv@xxxxxxxxxxxxx> wrote:
> On 2011-05-11 10:13, Robert Collins wrote:
>
>> I suspect an easy migration target if folk want one would be to
>> migrate all the fire-and-forget jobs to trigger via rabbit (leaving
>> the payload in the db), by hooking a 'do it now' message into the
>> post-transaction actions in zope.
>
> It's exciting news.  We'll want to be careful in migrating jobs though: IIRC
> rabbit is nontransactional. That means we'll still need some way for
> consumers of jobs to recognize cases where the producer transaction aborted
> after firing off the job.

I believe 2pc is possible with amqp 0.10, but rabbitmq does 0.8 - and
see the debate around 0mq for

> In some of those cases, executing a job unnecessarily won't hurt -- ones
> that refresh statistics for example.  In others, the job absolutely must not
> execute.
>
> Without having looked into it properly, I think we'll need some kind of
> wrapper to support this distinction.  Traditional transactional messaging
> uses two-phase commit; other products use database queues similar to our
> Job.  Both are probably overweight to the point where our baby would go out
> with the bathwater.  We could fake it by queuing up jobs in memory and
> sending them after commit, but that leaves open a window for message loss.

pg_amqp is probably the right thing for now, for
must-be-coupled-to-pg-transaction issues.

That said, I suspect much of our usage will be in notifications not
content transferral. that is, idempotent messages that trigger
processing of something in a transactional store.

Beyond that, I think we need consider ordering things (where possible)
to be idempotent and failure tolerate (given the potential overheads
of 2pc).

> Another problem happens when things work too well: you create a
> database-backed object.  You fire off a job related to that object.  You
> commit.  But the consumer of that job picks it up before your commit has
> propagated and boom!  The job dies in flames because it accesses objects
> that aren't decisively in the database yet.

For this sort of thing fire-after-commit should be easy and sufficient.

> I imagine both problems go away if every message carries a database
> transaction id, and the job runner keeps an eye on the database transaction
> log: the runner shouldn't consume a job until the producing transaction has
> committed, and it should drop jobs whose producers have aborted.  Is
> something along those lines built in?

No, and that sounds terrifying to me - if we tie things up that
tightly, we may as well just have a queue in the db and poll it every
50ms.

I think we need some optional glue for things that need transactional
semantics or after-transaction semantics. But there are many other
sorts of things(offhand: cancelling of jobs [tell the job service to
do the cancellation], reporting of oopses, dispatching to clusters of
workers, operational stats gathering, parallelism-within-a-request)
for which we shouldn't need either of those constraints - and for
those we should be as lean as we can.

-Rob


Follow ups

References