← Back to team overview

launchpad-dev team mailing list archive

Re: brainstorm: cheaper API collection iteration

 


My understanding, may be way off.

Pages in collections are just links, so you can generate what you
like. You can generate the bare link to the collection to get the first
page, and then when that page is returned it would say

   next_page_link:
   "https://api.launchpad.net/devel/some_collection?start_key=endkey_of_this_collection

then when the next page is needed launchpadlib simply hits that URL.

Therefore I think this would be fairly straightforward to change in the
current webservice. It also looks as though just changing
lazr.batchnavigator will get most of the way there, so it may be that
fixing the web UI fixes the API too.

In general, this is right. lazr.restfulclient follows the URL it finds in next_collection_link without caring what that URL is. launchpadlib doesn't mess with the collection links at all, and lazr.restfulclient never uses previous_collection_link. So if you come up with a better way to paginate records, you can just change next_collection_link and the client will adapt.

I know of two exceptions, both optimizations in the code that handles slices (_get_slice() in lazr.restfulclient).

1. If you ask for a slice like launchpad.bugs[:76], lazr.restful gets the first page (which happens to have 75 entries), and then this code runs:

            if more_needed > 0 and more_needed < first_page_size:
                # An optimization: it's likely that we need less than
                # a full page of entries, because the number we need
                # is less than the size of the first page we got.
                # Instead of requesting a full-sized page, we'll
                # request only the number of entries we think we'll
                # need. If we're wrong, there's no problem; we'll just
                # keep looping.
                page_url = self._with_url_query_variable_set(
                    page_url, 'ws.size', more_needed)

So, we take next_collection_link, and we hack ws.size to only give us (in this case) one additional entry. We don't need another 75 entries, we only need one.

I think this will continue to work no matter what next_collection_link looks like, so long as ws.size continues to work. Worst case, we can simply remove the optimization.

2. When you ask for a slice like 'launchpad.bugs[70:200]', this code runs:

            # No part of this collection has been loaded yet, or the
            # slice starts beyond the part that has been loaded. We'll
            # use our secret knowledge of lazr.restful to set a value for
            # the ws.start variable. That way we start reading entries
            # from the first one we want.
            first_page_size = None
            entry_dicts = []
            page_url = self._with_url_query_variable_set(
                self._wadl_resource.url, 'ws.start', start)

This "secret knowledge of lazr.restful" would be invalidated by the change. We could stop supporting this syntax in later versions of launchpadlib, or we could just remove the optimization: make lazr.restfulclient load subsequent pages from the beginning until it has 200, and then perform the slice.

I did a quick search of Launchpad for references to ws.start and ws.size. It shows up a lot in tests, but that's about it. In particular, it doesn't seem to be used in the Javascript code (though client.js has support for it). I did see some batch navigation code in picker.js. It looks like Launchpad code, not web service code, but it may have to be changed for the same reasons as the web service needs to be changed.

Leonard



Follow ups

References