launchpad-dev team mailing list archive
-
launchpad-dev team
-
Mailing list archive
-
Message #06080
Re: Design pattern for interruptible processing of continuous data
On Tue, 2011-01-04 at 15:29 +0000, Julian Edwards wrote:
> Dear all
>
> I've seen this problem pop up in similar ways a few times now, where we're
> processing a bunch of data in a cron job (whether externally on the API, or
> internally) and it needs to do a batch of work, remember where it left off
> (whether reaching a batch limit or the live data is paused), and continue
> later.
>
> Typically to solve this, the client processing the data stores some piece of
> context about the data it's processing, and uses that data to re-start from
> the right place next time.
>
> I think it would be a good idea to formalise a design around this in such a
> way that will also be beneficial to us when we eventually start using a
> message queuing application.
>
> In a previous life, the context data that I've used for this is a timestamp,
> and it worked very well in pretty much all cases I came across. The client
> application simply provides the same timestamp to a query/api call from the
> last item it processed, and the data continues to flow from where it left off.
> This ticked all the boxes for data integrity and polling or streaming usage.
>
I'm curious why one can't just start using message queues on the batch
job only.
Rather than a cron job that does all the work, the batch job could
simply push all the work into a queue. Whenever the message queue is
ready for frontend consumption, the batch jobs go away and the frontend
starts feeding the backend directly.
Trying to emulate the queue's robustness seems a noble, but possibly
unnecessary effort if queues are coming any time soon.
Follow ups
References