← Back to team overview

maria-developers team mailing list archive

Re: Question about Query plan

 

On Mon, May 14, 2012 at 08:40:59AM +0200, Markus Pilman wrote:
> Hello all,
> 
> At ETH Zurich we are working on a new storage engine, that allows us
> to test several new architectures for transactional databases. So far
> we worked with MySQL, but we had massive performance issues. After
> some investigation we figured out, that MySQL generates different
> query plans for InnoDB than for our engine. One query which killed our
> performance was the following (this is a query from the TPC-W
> benchmark):
> 
> SELECT ol2.ol_i_id, SUM(ol2.ol_qty) AS sum_ol
> FROM order_line ol, order_line ol2, (SELECT o_id FROM orders ORDER BY
> o_date DESC LIMIT 10000) AS t
> WHERE ol.ol_o_id = t.o_id AND ol.ol_i_id = 10 AND ol2.ol_o_id = t.o_id
> AND ol2.ol_i_id <> 10
> GROUP BY ol2.ol_i_id ORDER BY sum_ol DESC LIMIT 0,5
> 
> MySQL generated the following plan for InnoDB:
> 
> +----+-------------+------------+-------+-------------------------------+----------------+---------+-----------------+-------+---------------------------------+
> | id | select_type | table      | type  | possible_keys
>  | key            | key_len | ref             | rows  | Extra
>                  |
> +----+-------------+------------+-------+-------------------------------+----------------+---------+-----------------+-------+---------------------------------+
> |  1 | PRIMARY     | <derived2> | ALL   | NULL
>  | NULL           | NULL    | NULL            | 10000 | Using
> temporary; Using filesort |
> |  1 | PRIMARY     | ol         | ref   |
> orderline_o_id,orderline_i_id | orderline_o_id | 8       | t.o_id
>     |     1 | Using where                     |
> |  1 | PRIMARY     | ol2        | ref   |
> orderline_o_id,orderline_i_id | orderline_o_id | 8       |
> tpcw.ol.OL_O_ID |     1 | Using where                     |
> |  2 | DERIVED     | orders     | index | NULL
>  | orders_o_date  | 4       | NULL            | 10000 | Using index
>                  |
> +----+-------------+------------+-------+-------------------------------+----------------+---------+-----------------+-------+---------------------------------+
> 
> while it generated the following one for our storage engine:
> 
>  +----+-------------+------------+-------+-------------------------------+----------------+---------+-------+-------+----------------------------------------------+
> | id | select_type | table      | type  | possible_keys
>  | key            | key_len | ref   | rows  | Extra
>                     |
> +----+-------------+------------+-------+-------------------------------+----------------+---------+-------+-------+----------------------------------------------+
> |  1 | PRIMARY     | ol         | ref   |
> orderline_o_id,orderline_i_id | orderline_i_id | 5       | const |
> 10 | Using where; Using temporary; Using filesort |
> |  1 | PRIMARY     | ol2        | range |
> orderline_o_id,orderline_i_id | orderline_i_id | 5       | NULL  |
> 20 | Using where; Using join buffer               |
> |  1 | PRIMARY     | <derived2> | ALL   | NULL
>  | NULL           | NULL    | NULL  | 10000 | Using where; Using join
> buffer               |
> |  2 | DERIVED     | orders     | index | NULL
>  | orders_o_date  | 4       | NULL  | 10000 |
>                     |
> +----+-------------+------------+-------+-------------------------------+----------------+---------+-------+-------+----------------------------------------------+
It's difficult to make guesses just on EXPLAINs, but I will try: we see
{table=ol, rows=10},  {table=ol2, rows=20} -- this makes me suspect that 
your storage engine has this function (I recall some skeleton storage engine
implementation had this):

  ha_rows  records_in_range(...)  { return 10; }    (*)

The WHERE clause has:
- ol.ol_i_id = 10  -- this producess ref acess on table `ol`, and the estimate 
  of 10 rows comes from records_in_range() call.

- ol2.ol_i_id <> 10 -- the optimizer converts this to 
    
   (-inf < ol2.ol_i_id <10) OR ( 10< ol2.ol_i_id < +inf)
  
  which is two ranges and we get an estimate of 20 rows. 

Questions:
- Do you really records_in_range() implementation like in (*)

- Are estimates of 10 and 20 rows close to reality?

> The second one is obviously a very bad one. So we decided to try with
> MariaDB, which generates the following query plan:
> 
> +------+-------------+------------+-------+-------------------------------+----------------+---------+-----------------+-------+---------------------------------+
> | id   | select_type | table      | type  | possible_keys
>    | key            | key_len | ref             | rows  | Extra
>                    |
> +------+-------------+------------+-------+-------------------------------+----------------+---------+-----------------+-------+---------------------------------+
> |    1 | PRIMARY     | ol         | ref   |
> orderline_o_id,orderline_i_id | orderline_i_id | 5       | const
>     |    10 | Using temporary; Using filesort |
> |    1 | PRIMARY     | ol2        | ref   |
> orderline_o_id,orderline_i_id | orderline_o_id | 8       |
> test.ol.OL_O_ID |    11 | Using where                     |
> |    1 | PRIMARY     | <derived2> | ref   | key0
>    | key0           | 8       | test.ol.OL_O_ID |    10 |
>                    |
> |    2 | DERIVED     | orders     | index | NULL
>    | orders_o_date  | 4       | NULL            | 10000 |
>                    |
> +------+-------------+------------+-------+-------------------------------+----------------+---------+-----------------+-------+---------------------------------+
table=<derived2>, key=key0 - apparently, this is MariaDB 5.3/5.5, and it's
using "derived table with keys" 
(http://kb.askmonty.org/en/derived-table-with-key-optimization) optimization.


> The query plan from MariaDB looks sane to me, and the numbers approve
> this (the query runs on a middle sized data set about 200 times faster
> with MariaDB than with MySQL). So we will continue our work with
> MariaDB. But I have a question to these query plans: why are we
> getting this differences in MySQL between our storage engine and
> InnoDB? Is there a feature in our storage engine missing (we first
> thought we need the ability to support HA_KEYREAD_ONLY - but
> implementing this feature did not change the query plan)? Or does
> MySQL some kind of "cheating"? We should understand this issue to be
> able to present our results we get later (may be we will compare
> MariaDB and MySQL, but in a paper we would have to explain why MySQL
> sucks that much).
> 
> And btw: good work with MariaDB!! The optimizer seems to do a much
> better job than MySQL - even with InnoDB/XtraDB (we had to rewrite
> some queries in MySQL to force it to generate sane query plans - with
> MariaDB this does not seem to be necessary anymore).

BR
 Sergei
-- 
Sergei Petrunia, Software Developer
Monty Program AB, http://askmonty.org
Blog: http://s.petrunia.net/blog


Follow ups

References