← Back to team overview

maria-developers team mailing list archive

Re: Order by speed optimization

 

On 01/11/2013 09:33 PM, Matthew Lagoe wrote:
> I was talking with a few people about a query such as
> 
> 
>     |select * from table order by X, Y, Z limit G;|
> 
> 
> As the "ORDER BY" for me could change to any number of 12 values in any
> order I am not able to use a multicolumn index as such I was looking at
> how to optimize in such case where sorting is dynamic.
> 
> On my dataset the query takes ~7 seconds for a full table scan but only
> .3 seconds when using the column index and returns ~80,000 records for
> the query (before the LIMIT)
> 
> I was wondering if there is any reason that the order by syntax doesn't
> use the following logic, I was thinking of looking at the code but was
> told such a simple optimization must have a reason that its not already
> in the code, so I am asking here before I spend hours being stupid.
> 
> 
> This logic block will be based on the above query.
> 
>    1. If index exists for column X and there is a limit on the query
>       then use the index and sort the list, then return the first G
>       values, continue past G until you hit a different value for X then
>       the value that was at G
>    2. Use this smaller subset of data for then ordering the data based on Y
>    3. Order the data again based on Z
>    4. Return the sorted data like normal
> 
> If there is a reason this optimization isn't in the code please someone
> let me know, if not I think I will take a look at adding it.
> 
> 
> -Botanic

Hi Matthew,

I gave more thoughts to your optimization for ORDER BY with LIMIT.
I find it very good for the following reasons.

MariaDB 10.0 (and MySQL 5.6) support an optimization for ORDER BY with
LIMIT that employes a priority queue. This optimization requires a full
scan of the table.
Your optimization does not require this.

Suppose you have a query
with ORDER BY a,b,x,y,z LIMIT N
and an index on (a,b,c).

In this case you could perform an index scan putting the records you
read into the priority queue as the regular algorithm for ORDER BY with
LIMIT does. Yet you don't have to scan all records. You can stop when
you reach the keys with the values of (a,b) greater than all those
stored in the priority queue (remember that our priority queue contains
N elements).

I believe that the change for the regular algorithm to support your
optimization will not be difficult for you.

When you implement such a modification you already will be able to
see how big benefits you have for your application with this
optimization. For this purpose it would be enough for you to force
using a proper index for the ORDER BY clause.

An automatic choice of a proper index will be a more complex part of the
development. Hopefully I will be helpful here.

Regards,
Igor.





> 
> 
> 
> _______________________________________________
> Mailing list: https://launchpad.net/~maria-developers
> Post to     : maria-developers@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~maria-developers
> More help   : https://help.launchpad.net/ListHelp



References