← Back to team overview

openstack team mailing list archive

Re: Getting pagination right

 

I'm not proposing using marker/limit over offset/limit. I'm proposing
using marker/limit/offset over searching manually through a giant set
of results looking for a marker and then grabbing the next few
results. ;)

On Wed, May 25, 2011 at 2:26 PM, Greg Holt <gholt@xxxxxxxxxxxxx> wrote:
> Also, using marker/limit rather than offset/limit makes it much easier to
> shard for scale if desired. One of the reasons we chose that route with
> Swift. Regular databases with indexes work better with marker/limit than
> offset/limit too.
> On May 25, 2011, at 12:38 PM, Justin Santa Barbara wrote:
>
> I definitely agree that the paging logic is questionable; we definitely
> should be specifying the sort order if we're relying on it, as I believe a
> database is otherwise free to return results however it sees fit.
> I'm not convinced that the proposed logic around the queries and update
> timestamps is typo-free.  But even if it was correct, the problem with
> timestamps is that I don't think you can get a sensible set of results.  If
> you order by increasing timestamp, then I think you see items repeatedly if
> they are concurrently updated.  If you order by decreasing timestamp, I
> think you can miss an item altogether if it's updated in between page
> fetches.
> I think the 'normal' way to make this work is to order by primary key, and
> to have the marker be the PK of the 'last seen row'.  Because the client
> can't know what the pk is, the marker is normally just an opaque value.
>  It's also (normally) much easier for the DB to sort by primary key than by
> any other value.
> The behaviour of an iteration using the PK is fairly nice.  You never see a
> row twice, so you don't need to de-dup client side.  You are guaranteed to
> see all rows, unless they are concurrently inserted or deleted.  If a row is
> inserted you'll see it if you haven't yet fetched the page it would be on.
>  If a row is deleted you don't see it unless you've already fetched the page
> it was on.
> Zones may complicate any scheme, though I think both proposed methods could
> be made to work.  I think in general the next N rows have to be fetched from
> each child zone and then combined (top N), throwing away the extra records.
>   One way around this would be to define an order on the child zones, and
> then show the records from zone 1, then those from zone 2 etc etc.  You just
> need to map from the marker to a child zone, and if you don't get the full
> complement of rows you need to merge in the first results from the next zone
> as well.
> The SQL query using the PK would look like this (and there's no need for an
> OFFSET clause):
> SELECT *
> FROM <table>
> ORDER BY <pk>
> WHERE <pk> > @marker
> AND (<additional filters>)
> LIMIT <pagesize>
> Without additional filters, that's a _very_ efficient query for a sensible
> database.
> Justin
>
>
> On Wed, May 25, 2011 at 9:28 AM, Jay Pipes <jaypipes@xxxxxxxxx> wrote:
>>
>> When a user issues a GET request for a resource collection, it's clear
>> that you don't want to return ALL results in the collection. Returning
>> hundreds of thousands of records per request isn't particularly a good
>> idea.
>>
>> Therefore, we use the concept of "pagination" -- returning a subset of
>> records matching the search criteria.
>>
>> In Glance's API, we've added this ability (currently in code review),
>> but our approach is a bit at odds with the way that the OpenStack
>> Compute API is structured. I'd like to explain why I think our
>> approach is better, both in terms of usability and in terms of
>> efficiency.
>>
>> First, here is how Nova does its pagination in the OpenStack API.
>>
>> 1) Look for a "marker" query parameter in the request
>> 2) Look for a "limit" query parameter in the request, otherwise use a
>> default max limit.
>> 3) If marker parameter is found, the value of the "marker" query
>> parameter is assumed to be an *integer* and assumed to be the
>> *identifier of the last object on the previous "page" of results*
>> 4) Request *all* results
>> 5) In the code, find the "page" of results by looking for the object
>> in the set of results that matches the marker parameter
>> 6) When found, grab the next object in the result set, plus X objects,
>> where X is the limit parameter
>>
>> The relevant code is in /nova/api/openstack/common.py, and looks like so:
>>
>> def limited_by_marker(items, request, max_limit=FLAGS.osapi_max_limit):
>>    """Return a slice of items according to the requested marker and
>> limit."""
>>
>> <snipped for brevity... >
>>
>>    limit = min(max_limit, limit)
>>    start_index = 0
>>    if marker:
>>        start_index = -1
>>        for i, item in enumerate(items):
>>            if item['id'] == marker:
>>                start_index = i + 1
>>                break
>>        if start_index < 0:
>>            raise webob.exc.HTTPBadRequest(_('marker [%s] not found' %
>> marker))
>>    range_end = start_index + limit
>>    return items[start_index:range_end]
>>
>> There are three MAJOR problems with the above method of pagination:
>>
>> 1) Very inefficient
>>
>> The Python code does the job of determining the page of results to
>> find. This means that the database query returns EVERY row in the
>> resultset, instead of only returning a small subset of the results
>> using the LIMIT X OFFSET Y SQL construct. With hundreds of thousands
>> of records in the various databases, this method is simply not
>> scalable.
>>
>> 2) There is no ordering of the results in the queries the Nova
>> database API makes
>>
>> The only queries in the Nova database API that order results is the
>> instance_type_get_all() method.
>>
>> Without an order to the search results, paging is almost meaningless.
>> A page of results depends on the order of the list of results. The
>> reason the existing Nova code works at all is because of the fact that
>> all objects have an autoincrementing integer primary key, and by
>> nature, this autoincrementing integer primary key increases over time.
>> If the identifier was switched to, say, UUID, the results ordering
>> would fall apart completely, and "pages" of results would get skewed
>> and corrupted as records were inserted over time.
>>
>> For instance, assume a character primary key (like UUID keys would be
>> stored in hex string format). Assume the following simplified set of
>> instances:
>>
>> PK         Created_at
>> ABC       2011-05-25 12:00:00
>> DEF       2011-05-25 12:00:01
>> GHJ       2011-05-25 12:00:02
>> KLM       2011-05-25 12:00:03
>>
>> Assume you want to get a list of instances with GET /servers?limit=2.
>> With the existing code, you'd get this:
>>
>> ABC       2011-05-25 12:00:00
>> DEF       2011-05-25 12:00:01
>>
>> With GET /servers?limit=2?marker=DEF, you'd get this:
>>
>> GHJ       2011-05-25 12:00:00
>> KLM       2011-05-25 12:00:03
>>
>> However, let's say a new instance is added:
>>
>> BCD       2011-05-25 12:00:04
>>
>> Because there is no sort order specified in the database queries, the
>> order of results is indeterminate. Page 1 may now contain the BCD
>> record, or it may not. It totally depends on the implementation of the
>> relational database, because the SQL standard does not mandate a
>> default sort order when one is not specified in the ORDER BY clause.
>>
>> 3) The marker parameter assumes an autoincrementing primary key
>>
>> Because the marker parameter assumes an autoincrementing primary key,
>> the paging of results currently DOES solve the problem of new records
>> messing up paging.
>>
>> As an example, let's assume I've fixed #2 above and put a default sort
>> order on the primary key, descending. Also assume a simple resultset
>> like so:
>>
>> PK         Created_at
>> 1      2011-05-25 12:00:00
>> 2      2011-05-25 12:00:01
>> 3      2011-05-25 12:00:02
>> 4      2011-05-25 12:00:03
>>
>> Assume I make a request for GET /servers?limit=2. I would get:
>>
>> 4      2011-05-25 12:00:03
>> 3      2011-05-25 12:00:02
>>
>> The results returned would be the "last" 10 server instances created.
>> If I did a request for GET /servers?limit=2?marker=3, I would get the
>> results:
>>
>> 2      2011-05-25 12:00:01
>> 1      2011-05-25 12:00:00
>>
>> If I add a new instance, say:
>>
>> 5      2011-05-25 12:00:05
>>
>> and did the request for GET /servers?limit=2?marker=3, I would still
>> get the same results:
>>
>> 2      2011-05-25 12:00:01
>> 1      2011-05-25 12:00:00
>>
>> However, the ONLY reason this works is because the primary key is
>> auto-incrementing, and new records always come at the "end" of the
>> primary key. If I change the primary key to a UUID, the call to GET
>> /servers?limit=2?marker=3 would no longer be guaranteed to return the
>> same results. The newly-added instance's UUID may come between the
>> values of the primary keys in the original "page 2", and therefore
>> messed up the consistency of the pagination of my original query.
>>
>> The way to avoid this inconsistency is to ensure that you have a
>> consistent view of the set of results of your original query. To do
>> that, you must always use a WHERE condition on a temporal,
>> always-incrementing (timestamp) column.
>>
>> In SQL, this is what is necessary to have a consistent view of pages
>> of a result set (assuming 10 elements to a page):
>>
>> Page 1:
>>
>> SELECT i.* FROM instances i
>> ORDER BY i.id DESC;
>> LIMIT 10;
>>
>> SET @marker = <ID of *first* record>
>> SET @updated_at_marker = SELECT updated_at FROM instances WHERE id =
>> @marker;
>>
>> Page 2:
>>
>> SELECT i.* FROM instances i
>> WHERE updated_at <= @updated_at_marker
>> ORDER BY i.id DESC
>> LIMIT 10 OFFSET 10;
>>
>>
>> SUMMARY:
>>
>> My proposal is to overhaul the way that Nova does pagination so that we:
>>
>> 1) Pass the LIMIT and OFFSET parameters down into the database API queries
>> 2) Add a default ORDER BY for all queries returning result sets
>> 3) Add a "page" query parameter that would be used to calculate the
>> OFFSET programmatically
>> 4) Change the concept of the "marker" parameter to refer to *the ID of
>> the FIRST record returned in the original query*
>>
>> Thoughts?
>>
>> -jay
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~openstack
>> Post to     : openstack@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~openstack
>> More help   : https://help.launchpad.net/ListHelp
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp
>
>


Follow ups

References