launchpad-dev team mailing list archive
-
launchpad-dev team
-
Mailing list archive
-
Message #06077
Design pattern for interruptible processing of continuous data
Dear all
I've seen this problem pop up in similar ways a few times now, where we're
processing a bunch of data in a cron job (whether externally on the API, or
internally) and it needs to do a batch of work, remember where it left off
(whether reaching a batch limit or the live data is paused), and continue
later.
Typically to solve this, the client processing the data stores some piece of
context about the data it's processing, and uses that data to re-start from
the right place next time.
I think it would be a good idea to formalise a design around this in such a
way that will also be beneficial to us when we eventually start using a
message queuing application.
In a previous life, the context data that I've used for this is a timestamp,
and it worked very well in pretty much all cases I came across. The client
application simply provides the same timestamp to a query/api call from the
last item it processed, and the data continues to flow from where it left off.
This ticked all the boxes for data integrity and polling or streaming usage.
We would need to store this context somewhere of course, and I am proposing
that we create a new generic table for this, along the lines of:
CREATE TABLE DataTimestamps (
name TEXT NOT NULL,
timestamp TIMESTAMP NOT NULL
)
where "name" is the name of the client app that's using the data and timestamp
is the last thing it processed.
The timestamp would also need to live on each context record as well, of
course. Most of our data already has this.
This will be immediately useful in the Derived Distros feature that my team is
working on, so I'm keen to get this sorted out quickly.
All constructive comments welcome.
Cheers
J
Follow ups