← Back to team overview

maria-developers team mailing list archive

Re: LIMIT optimisations


Hi !

On a bulletin board context, that's quite simple :

Let's say we want to display a forum thread containing a lot of posts.

To simplify, I have the following table 'posts' which contains :

- id_thread
- id_post
- content
- id_author

If I want to display a paginated posts list of a given topic, with 30 posts per page, I have to do :

SELECT content, author_name FROM posts LEFT JOIN author USING (id_author) ....... WHERE id_thread=.... ORDER BY id_post ASC LIMIT x,30
I have a PK on (id_thread, id_post).

If I have a lot of posts in this thread, I could have easily a big LIMIT to get the last pages of the thread, which are the more often read (and the query will be triggered quite often especially if google like my bulletin board :)). The current behaviour of MyISAM seems to be to always scan all the rows; than means if I have a LIMIT 12000,40, the first useless 12000 rows will be scanned, and this is especially bad if "content" is a TEXT field (no static lengths row here).
That means the more pages I have, to slower is the query.
The only workarounds I have found for now are :
- to maintain in my application a (id_post => position) mapping, which allows me to do a range search without limit clause which is quite fast. - to do a SELECT id_post FROM id_thread=.... ORDER BY id_post ASC LIMIT x,1 and SELECT id_post FROM id_thread=... ORDER BY id_post ASC LIMIT x+31,1, and use these two positions to make a range search without limit clause (a kind of manually ICP)

HANDLER doesn't seem to solve the issue, because I cannot jump easily on the row number 12000 in index (id_thread, id_post). Moreover I cannot make my JOIN as well.

So to sum up :

- I want rows in key order, but key order should most of the time matches the row position
- I need to read a row from a specific position
- I don't need to read rows backwards

I don't know if index condition pushdown is currently used to optimize LIMIT clause, but than could indeed improve a lot those kind of issues.
MRR should not help here since usually rows in the data file are in the same order than in this specific index.


Le 10/03/10 21:12, Michael Widenius a écrit :

Catching up with my old emails...
"Jocelyn" == Jocelyn Fournier<joce@xxxxxxxxxxxxx>  writes:
Jocelyn>  Hi,
Jocelyn>  Following this discussion in 2007 : http://lists.mysql.com/internals/34287

Jocelyn>  Is there any plan to implement such an optimisation in MariaDB ? (I
Jocelyn>  think a lot of web app using pagination could take benefit of such an
Jocelyn>  optimisation, although there are some workarounds to avoid big LIMIT for
Jocelyn>  pagination)

Thanks for the reference to the old discussion, I had missed the
original and it was interesting reading.

The problem here is that if you do a lot of deletes of rows or updates
of keys, it would be hard to impossible to efficiently store a position
id for each row.

However some of the storage engines have a direct position to rows.

- For MyISAM (and Maria) with static lengths row, you can directly
   jump to row 'X'.

- For Maria, each 'dynamic length row' has an ID that one can use for
   positioning (There may be 'holes' in the ID, but that could be
   taken care of).

However, neither of the above would help if you want to have position
based on an index.

Exactly what problem is it that you want to solve ?

I saw your question of:

How to optimize select * from abc limit 1000,30?

Can't you use the HANDLER calls for this ?

(This allows you to freely read backwards/forwards on an index with limit)

In this case you don't know exactly where you are in the table, but
you can freely move 30 rows from your current position.

If you could describe your application a bit, there is maybe ways to
easily extend the handler interface to solve your problem...

For example, knowing this would help:
- Do you want rows in position order, not key order?
- Do you need to read a row from a specific position (1000) or
   do you need to read rows backward/forward based on an old position?
- Do you need to read rows backwards?

(By the way, I don't understand Sergei's comment about read_set in
this context)


Mailing list: https://launchpad.net/~maria-developers
Post to     : maria-developers@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp

Follow ups