← Back to team overview

launchpad-dev team mailing list archive

Re: exposed map/reduce API

 

* Robert Collins <robertc@xxxxxxxxxxxxxxxxx> [2011-06-15 08:05 +1200]:
> On Wed, Jun 15, 2011 at 6:34 AM, Martin Pool <mbp@xxxxxxxxxxxxx> wrote:
> > 2- Another approach is to make it easier for the client to maintain an
> > offline cache by emphasing "get me changes since date X" or "get me
> > objects ordered by last change" (key cases like bugs already exist);
> > and a client library that will make intelligent use of this abstracted
> > from the application code.  I think Arsenal does this.
> >
> > Getting better apis, and better handling of cached results, would let
> > API clients do totally general work with probably something like 10x
> > to 100x fewer API calls, correspondingly faster time, and nearly that
> > much less Launchpad server load.
> >
> > 3- Robert pointed out that having every API user keep a replicated
> > copy of parts of the Launchpad database is perhaps not the most
> > elegant solution, compared to doing this work on the server.  They
> > could instead send a kind of map/reduce expression to the server and
> > get back the results.
> 
> My mental sketch for this service would have it call forward: in
> python terms the signature is something like:
> Launchpad.mapreduce(objectsearch, mapfn, reducefn, resultendpoint)
> 
> resultendpoint would get a signed POST from LP with the results, when
> they are ready. However see below, we can be much more low key
> initiially and just use email.

The first thing that crossed my mind would be using a message queue to
expose a notification interface rather than a pull interface. One benefit
is that the browser could be a consumer of that interface to provide a more
asynchronous user experience instead of polling periodically. This is just
another idea though, and whether returning results in the browser actually
makes sense remains to be seen. This is very exciting, I'm anxious to see
how user stories evolve for this feature.

> Having folk replicate LP in an adhoc fashion isn't just inelegant: any
> new bug analysis task requires someone new to pull all 800K bugs +
> 830K bugtasks + 9M messages out of the DB, store it locally, and then
> process it. It makes running analysis a complex and time consuming
> task. Its great folk /can/ do it, but its also hard to support - our
> top timeout today is due to folk analysing hardware DB records - a
> 2.7M rows into the collection it starts timing out.

That was me and the fact I finally found a way to run multiple launchpadlib
instances in parallel probably didn't help either. My recent merge request
will hopefully reduce the load on the server eventually. Sorry about that!

-- 
Marc Tardif <marc.tardif@xxxxxxxxxxxxx>
Freenode: cr3, Jabber: cr3@xxxxxxxxxx
1024D/72679CAD 09A9 D871 F7C4 A18F AC08 674D 2B73 740C 7267 9CAD



References