launchpad-dev team mailing list archive
-
launchpad-dev team
-
Mailing list archive
-
Message #06089
Re: Design pattern for interruptible processing of continuous data
On Wednesday 05 January 2011 00:30:35 Martin Pool wrote:
> On 5 January 2011 02:29, Julian Edwards <julian.edwards@xxxxxxxxxxxxx>
wrote:
> > Dear all
> >
> > I've seen this problem pop up in similar ways a few times now, where
> > we're processing a bunch of data in a cron job (whether externally on
> > the API, or internally) and it needs to do a batch of work, remember
> > where it left off (whether reaching a batch limit or the live data is
> > paused), and continue later.
>
> I think I something like this in a bug last December about the branch
> scanner being killed when it runs out of memory. This apparently
> doesn't happen very often and it wasn't totally clear to me or people
> I asked what would happen to jobs (using the word loosely) that were
> in progress at the moment it was killed.
>
> It would be awfully nice if that could be handled by a common layer so
> that killing the batch-processing job (even without unwinding its
> python stack) would result in the jobs being retried a few times and
> then failed. This seems to be a requirement mentioned in
> <https://dev.launchpad.net/Foundations/NewTaskSystem/Requirements>.
>
> Maybe that page can turn into a LEP and get moved along.
This would be great. I already coded something in the buildd-manager that
tries to do intelligent re-try and failure processing of jobs. The next step
is to generalise that behaviour.
When I get some time I'll start a LEP and further discussion.
Cheers
J
References