← Back to team overview

launchpad-dev team mailing list archive

Re: Design pattern for interruptible processing of continuous data

 

On Tue, 2011-01-04 at 15:29 +0000, Julian Edwards wrote:
> Dear all
> 
> I've seen this problem pop up in similar ways a few times now, where we're 
> processing a bunch of data in a cron job (whether externally on the API, or 
> internally) and it needs to do a batch of work, remember where it left off 
> (whether reaching a batch limit or the live data is paused), and continue 
> later.
> 
> Typically to solve this, the client processing the data stores some piece of 
> context about the data it's processing, and uses that data to re-start from 
> the right place next time.
> 
> I think it would be a good idea to formalise a design around this in such a 
> way that will also be beneficial to us when we eventually start using a 
> message queuing application.
> 
> In a previous life, the context data that I've used for this is a timestamp, 
> and it worked very well in pretty much all cases I came across.  The client 
> application simply provides the same timestamp to a query/api call from the 
> last item it processed, and the data continues to flow from where it left off.  
> This ticked all the boxes for data integrity and polling or streaming usage.
> 


I'm curious why one can't just start using message queues on the batch
job only.

Rather than a cron job that does all the work, the batch job could
simply push all the work into a queue. Whenever the message queue is
ready for frontend consumption, the batch jobs go away and the frontend
starts feeding the backend directly.

Trying to emulate the queue's robustness seems a noble, but possibly
unnecessary effort if queues are coming any time soon.




Follow ups

References