← Back to team overview

launchpad-dev team mailing list archive

Re: Design pattern for interruptible processing of continuous data

 

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11-01-05 06:34 AM, Julian Edwards wrote:
> On Tuesday 04 January 2011 22:39:19 Aaron Bentley wrote:

>> DBLoopTuner relies on the TuneableLoop's __call__() method to store the
>> restart-point.  So for example,
>> lp.translations.scripts.verify_pofile_stats.Verifier uses self.start_id
>> as the restart-point.  Your idea is similar to a TuneableLoop, except
>> that you want to store the restart point, and you want it to be
>> explicitly a timestamp instead of having it be an implementation detail.
> 
> The TuneableLoop stuff does not provide any mechanism to store restart points 
> itself, it relies on the code that inherits from it.

Yes, that's what I meant to say.

> I want something that will store these restart points atomically with the 
> operations that are processing them, and preferably in such a way that I don't 
> have to think about it too much.

You are focusing on a particular kind of solution to the "interruptible
processing of continuous data", but there are other kinds of solutions.
 Shouldn't we look at them, too?

>> To apply "micro-jobs" to this problem, you would represent each
>> operation as a "micro-job".  You would directly represent which jobs had
>> been run and which ones had not.  The specifics depend on how we end up
>> implementing the new task system, but one obvious way would be to have a
>> status like BuildStatus for each micro-job.
> 
> Micro jobs are a nice idea, but are orthogonal to what I want to do here.  
> They might even end up using this design.

They provide a way of doing interruptible processing of continuous data.
 They only seem orthagonal because you're focusing on solutions that
store restart points.

> I think you're right, basically because of what we get from postgres.  My 
> previous experiences were on an in-house DB solution that did just all this 
> stuff for you and it's clouded my thoughts a bit (along with the manflu!).
> 
> So if we have a table with (name, sequence) columns, is there anything else to 
> be concerned with?

You should probably consider whether a string name is better than an int
for identifying the client, and whether we should have a unique
constraint on the client.  I'm not sure there are any right answers, though.

If you want to support multiple-process (or multi-threaded) clients,
then the "restart point" would have be used to coordinate between the
different processes, and so it would need to be updated every time a
process started working on a new record.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk0kh80ACgkQ0F+nu1YWqI02FgCcD8wvsAD/fhhvoLyM4+Tr4YbF
kTYAnjIHJNYs3pHgIZSgaXUib1vXS2Xp
=SRaH
-----END PGP SIGNATURE-----



Follow ups

References