← Back to team overview

launchpad-dev team mailing list archive

brainstorm: cheaper API collection iteration

 

This is about optimising a particular part of the current webservice;
its still useful in an expand/filter world I think, because some
collections are massive.

A related thing is batch navigators - if we want non-page-reload
batches we will want to build on the API so having this work well in
the API is important [and relatedly the point Jeroen made at the Epic
about offsets in batching, batch stability etc all still apply].

First, the problem:

some of our collections, such as the one timing out in bug 730396 use
surrogate keys. By this I mean that 'index 1' in the collection is
/not/ 'id 1 in the database table' : the key for indexing into the
collection needs to be translated by the database server. When the
collection gets long - 1000's or 10,000's of items, this becomes
inefficient: to get a batch we issue  a query of the form:
select [stuff] from [stuff] order by ORDERSPEC LIMIT $SIZE OFFSET $START

This is inefficient because to determine the Nth item in the
collection the DB materializes the first N items. The more expensive
the collection, the more expensive this is. Accessing the 10th batch
is approximately as slow as summing the time to access the first 9
batches together.

Now, we could do a much more efficient query for the use case of
__iter__ through a collection in batches:

select [stuff] from [stuff] WHERE ORDERKEY > $LASTBATCHEND order by
ORDERSPEC LIMIT $SIZE

Note the following: no offset, instead an end key.

On the appserver side, changing BatchNavigator to use an extended
resultset interface to supply the sequence key fields from the
previous batch would be pretty straight forward.

But how do we teach launchpadlib to do this? And how do we encode this
in the wadl?

Or, should we not expose these things as unlimited size collections,
and instead expose them as scoped collection where the scope is a
range selector - the start ORDERKEY, and users can paginate through by
getting additional scoped collections each time they reach the end of
the subcollection?

-Rob



Follow ups