launchpad-dev team mailing list archive

Thread
Date

Re: Design pattern for interruptible processing of continuous data

To: Aaron Bentley <aaron@xxxxxxxxxxxxx>
From: Julian Edwards <julian.edwards@xxxxxxxxxxxxx>
Date: Wed, 5 Jan 2011 15:23:27 +0000
Cc: Launchpad Community Development Team <launchpad-dev@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <4D2487D0.30902@canonical.com>
Organization: Canonical Ltd
User-agent: KMail/1.13.5 (Linux/2.6.35-24-generic; KDE/4.5.1; x86_64; ; )

On Wednesday 05 January 2011 15:01:36 Aaron Bentley wrote:
> You are focusing on a particular kind of solution to the "interruptible
> processing of continuous data", but there are other kinds of solutions.
>  Shouldn't we look at them, too?
>
> > Micro jobs are a nice idea, but are orthogonal to what I want to do here.
> > They might even end up using this design.
> 
> They provide a way of doing interruptible processing of continuous data.
>  They only seem orthagonal because you're focusing on solutions that
> store restart points.

I am open to your suggestions of a way of recovering from a data point (after 
a crash or interval between cron jobs) without storing where you left off.  I 
cannot see how the loop tuner nor micro jobs do that.

> > I think you're right, basically because of what we get from postgres.  My
> > previous experiences were on an in-house DB solution that did just all
> > this stuff for you and it's clouded my thoughts a bit (along with the
> > manflu!).
> > 
> > So if we have a table with (name, sequence) columns, is there anything
> > else to be concerned with?
> 
> You should probably consider whether a string name is better than an int
> for identifying the client, and whether we should have a unique
> constraint on the client.  I'm not sure there are any right answers,
> though.

There aren't, I just picked a string to be easy on the eye.  It could just as 
well be an enum, but that makes life a little harder to just start writing 
data.

> If you want to support multiple-process (or multi-threaded) clients,
> then the "restart point" would have be used to coordinate between the
> different processes, and so it would need to be updated every time a
> process started working on a new record.

I am working on the basis that the client knows best and I just want to 
provide a level of abstraction at a good layer.  The restart point would need 
to be updated as often as the client felt necessary, which might be every time 
it "started working on a new record."

Anyway this might all be moot, I'm going to look at the state of Rabbit and 
work out what I can do with it.  I am hoping that it has some sort of unicast 
style queue, with a recovery method for missed messages.

Follow ups

Re: Design pattern for interruptible processing of continuous data
From: Aaron Bentley, 2011-01-05

References

Design pattern for interruptible processing of continuous data
From: Julian Edwards, 2011-01-04
Re: Design pattern for interruptible processing of continuous data
From: Julian Edwards, 2011-01-05
Re: Design pattern for interruptible processing of continuous data
From: Aaron Bentley, 2011-01-05