← Back to team overview

openstack team mailing list archive

Re: Getting pagination right

 

OpenstackPaginationEmail

I think there is a lot of confusion in the two uses of the word 'marker', and maybe under Jay's proposal we need another word for 'marker'.

Suppose we have the following images:

PK         Created_at                    Updated_at          Deleted_at
1       2011-05-25 12:00:00        2011-05-25 12:00:04        
2       2011-05-25 12:00:01        2011-05-25 12:00:03
3       2011-05-25 12:00:02        2011-05-25 12:00:05
4       2011-05-25 12:00:03                                 2011-05-25 12:00:09

Under the current 1.1 spec, 'marker' means the id of the last element you saw, such that:

/images?marker=3&limit=2

will give you the 2 images *updated* before server with id '3':

[1, 2]

(assuming order by updated_at desc)

This is *not* what the current code does because we do not ORDER BY updated_at yet, as Jay pointed out.  I'm just showing what the spec wants as it is currently written.

If I understand Jay's ideas correctly, he wants us to pass marker(a different marker), offset, AND limit.  So a query would go something like this:

/images?marker=<timestamp>&offset=3&limit=10

I believe that marker can be left empty here, and it will default to now(), but whatever <timestamp> gets set to, it will return images that were created *before* <timestamp> and that were deleted *after* <timestamp> (if any).  The main advantage here is that it gives you a persistent 'collection snapshot' of your query results, based on the time that you made the initial query.  If it takes you a minute to page through results, and images were deleted or added during that time, it wont throw off your pagination if you keep your marker constant.

If we passed in marker = '2011-05-25 12:00:03', offset = '0', and limit = '4', we would get:

[4, 3, 2, 1]

(assuming order by created_at desc)

using jay's method.  If we kept everything the same, but passed in '2011-05-25 12:00:10' as the marker, image 4 would not be on the list because at that time image 4 was deleted.

Please correct me if something above is incorrect.


As for thoughts, I talked with Mark Washenberger and Brian Waldon, and we came up with 2 possible ways to move forward.  

Things that we agree on in both cases:
	
* The current way nova handles paging is inefficient, and needs to be improved. 
* We need to use ORDER BY in all of our queries, and not assume that id's will be ordered by time.  
* We order our queries by created_at, *not* updated_at as specified in the current spec (you can see the confusion this may cause in my first example above).

I personally like Jay's proposal (except maybe keeping 'pages' out for now in favor of just having 1 way to do things, rather than many ways to do the same thing), but feel that the term 'marker' should maybe be renamed.  Maybe 'timestamp' would even be better? I'm open to other suggestions.

Another idea that we had was to still use marker/limit with marker being an id, but to move the existing inefficient python logic into the db layer.  This will give us the sharding/scaling advantages that Greg mentioned, and also get rid of a lot of the problems Jay outlined with our current implementation.

With this method, however, I believe we will need glance to support marker/limit as well in order to make things efficient.  

So, to summarize, our two suggestions currently are:

1) Follow Jay's proposal, but find a better name for 'marker'.
2) Keep with marker/limit but still move inefficient logic out of python to db layer, and have glance support marker/limit as well.

Thoughts on these two paths of moving forward?  Anyone have ideas for other routes we could take?



-----Original Message-----
From: "Jay Pipes" <jaypipes@xxxxxxxxx>
Sent: Wednesday, May 25, 2011 15:57
To: "Greg Holt" <gholt@xxxxxxxxxxxxx>
Cc: openstack@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Openstack] Getting pagination right

On Wed, May 25, 2011 at 3:43 PM, Greg Holt <gholt@xxxxxxxxxxxxx> wrote:
> Okay, I give up then. Not sure what's different with what you have vs. Swift dbs. Just trying to offer up what we do and have been doing for a while now.

The pagination in Swift is not consistent. Inserts into the Swift
databases in between the time of the initial query and the requesting
the "next page" can result in rows from the original first page
getting on the new second page.

Code in swift/common/db.py lines 958 through 974 shows an ORDER BY
name. Newly inserted objects (or records that are deleted) with a name
value > marker and < end_marker can result in a page changing its
contents on refresh. This is why I was saying it's not a consistent
view of the data.

-jay

_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to     : openstack@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp




Follow ups

References