launchpad-dev team mailing list archive
-
launchpad-dev team
-
Mailing list archive
-
Message #07079
Re: performance tuesday - the rabbit has landed
On Sat, May 14, 2011 at 3:06 PM, Jeroen Vermeulen <jtv@xxxxxxxxxxxxx> wrote:
> On 2011-05-11 10:13, Robert Collins wrote:
>
>> I suspect an easy migration target if folk want one would be to
>> migrate all the fire-and-forget jobs to trigger via rabbit (leaving
>> the payload in the db), by hooking a 'do it now' message into the
>> post-transaction actions in zope.
>
> It's exciting news. We'll want to be careful in migrating jobs though: IIRC
> rabbit is nontransactional. That means we'll still need some way for
> consumers of jobs to recognize cases where the producer transaction aborted
> after firing off the job.
I believe 2pc is possible with amqp 0.10, but rabbitmq does 0.8 - and
see the debate around 0mq for
> In some of those cases, executing a job unnecessarily won't hurt -- ones
> that refresh statistics for example. In others, the job absolutely must not
> execute.
>
> Without having looked into it properly, I think we'll need some kind of
> wrapper to support this distinction. Traditional transactional messaging
> uses two-phase commit; other products use database queues similar to our
> Job. Both are probably overweight to the point where our baby would go out
> with the bathwater. We could fake it by queuing up jobs in memory and
> sending them after commit, but that leaves open a window for message loss.
pg_amqp is probably the right thing for now, for
must-be-coupled-to-pg-transaction issues.
That said, I suspect much of our usage will be in notifications not
content transferral. that is, idempotent messages that trigger
processing of something in a transactional store.
Beyond that, I think we need consider ordering things (where possible)
to be idempotent and failure tolerate (given the potential overheads
of 2pc).
> Another problem happens when things work too well: you create a
> database-backed object. You fire off a job related to that object. You
> commit. But the consumer of that job picks it up before your commit has
> propagated and boom! The job dies in flames because it accesses objects
> that aren't decisively in the database yet.
For this sort of thing fire-after-commit should be easy and sufficient.
> I imagine both problems go away if every message carries a database
> transaction id, and the job runner keeps an eye on the database transaction
> log: the runner shouldn't consume a job until the producing transaction has
> committed, and it should drop jobs whose producers have aborted. Is
> something along those lines built in?
No, and that sounds terrifying to me - if we tie things up that
tightly, we may as well just have a queue in the db and poll it every
50ms.
I think we need some optional glue for things that need transactional
semantics or after-transaction semantics. But there are many other
sorts of things(offhand: cancelling of jobs [tell the job service to
do the cancellation], reporting of oopses, dispatching to clusters of
workers, operational stats gathering, parallelism-within-a-request)
for which we shouldn't need either of those constraints - and for
those we should be as lean as we can.
-Rob
Follow ups
References