maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #04699
Re: Question about Query plan
On Mon, May 14, 2012 at 08:40:59AM +0200, Markus Pilman wrote:
> Hello all,
>
> At ETH Zurich we are working on a new storage engine, that allows us
> to test several new architectures for transactional databases. So far
> we worked with MySQL, but we had massive performance issues. After
> some investigation we figured out, that MySQL generates different
> query plans for InnoDB than for our engine. One query which killed our
> performance was the following (this is a query from the TPC-W
> benchmark):
>
> SELECT ol2.ol_i_id, SUM(ol2.ol_qty) AS sum_ol
> FROM order_line ol, order_line ol2, (SELECT o_id FROM orders ORDER BY
> o_date DESC LIMIT 10000) AS t
> WHERE ol.ol_o_id = t.o_id AND ol.ol_i_id = 10 AND ol2.ol_o_id = t.o_id
> AND ol2.ol_i_id <> 10
> GROUP BY ol2.ol_i_id ORDER BY sum_ol DESC LIMIT 0,5
>
> MySQL generated the following plan for InnoDB:
>
> +----+-------------+------------+-------+-------------------------------+----------------+---------+-----------------+-------+---------------------------------+
> | id | select_type | table | type | possible_keys
> | key | key_len | ref | rows | Extra
> |
> +----+-------------+------------+-------+-------------------------------+----------------+---------+-----------------+-------+---------------------------------+
> | 1 | PRIMARY | <derived2> | ALL | NULL
> | NULL | NULL | NULL | 10000 | Using
> temporary; Using filesort |
> | 1 | PRIMARY | ol | ref |
> orderline_o_id,orderline_i_id | orderline_o_id | 8 | t.o_id
> | 1 | Using where |
> | 1 | PRIMARY | ol2 | ref |
> orderline_o_id,orderline_i_id | orderline_o_id | 8 |
> tpcw.ol.OL_O_ID | 1 | Using where |
> | 2 | DERIVED | orders | index | NULL
> | orders_o_date | 4 | NULL | 10000 | Using index
> |
> +----+-------------+------------+-------+-------------------------------+----------------+---------+-----------------+-------+---------------------------------+
>
> while it generated the following one for our storage engine:
>
> +----+-------------+------------+-------+-------------------------------+----------------+---------+-------+-------+----------------------------------------------+
> | id | select_type | table | type | possible_keys
> | key | key_len | ref | rows | Extra
> |
> +----+-------------+------------+-------+-------------------------------+----------------+---------+-------+-------+----------------------------------------------+
> | 1 | PRIMARY | ol | ref |
> orderline_o_id,orderline_i_id | orderline_i_id | 5 | const |
> 10 | Using where; Using temporary; Using filesort |
> | 1 | PRIMARY | ol2 | range |
> orderline_o_id,orderline_i_id | orderline_i_id | 5 | NULL |
> 20 | Using where; Using join buffer |
> | 1 | PRIMARY | <derived2> | ALL | NULL
> | NULL | NULL | NULL | 10000 | Using where; Using join
> buffer |
> | 2 | DERIVED | orders | index | NULL
> | orders_o_date | 4 | NULL | 10000 |
> |
> +----+-------------+------------+-------+-------------------------------+----------------+---------+-------+-------+----------------------------------------------+
It's difficult to make guesses just on EXPLAINs, but I will try: we see
{table=ol, rows=10}, {table=ol2, rows=20} -- this makes me suspect that
your storage engine has this function (I recall some skeleton storage engine
implementation had this):
ha_rows records_in_range(...) { return 10; } (*)
The WHERE clause has:
- ol.ol_i_id = 10 -- this producess ref acess on table `ol`, and the estimate
of 10 rows comes from records_in_range() call.
- ol2.ol_i_id <> 10 -- the optimizer converts this to
(-inf < ol2.ol_i_id <10) OR ( 10< ol2.ol_i_id < +inf)
which is two ranges and we get an estimate of 20 rows.
Questions:
- Do you really records_in_range() implementation like in (*)
- Are estimates of 10 and 20 rows close to reality?
> The second one is obviously a very bad one. So we decided to try with
> MariaDB, which generates the following query plan:
>
> +------+-------------+------------+-------+-------------------------------+----------------+---------+-----------------+-------+---------------------------------+
> | id | select_type | table | type | possible_keys
> | key | key_len | ref | rows | Extra
> |
> +------+-------------+------------+-------+-------------------------------+----------------+---------+-----------------+-------+---------------------------------+
> | 1 | PRIMARY | ol | ref |
> orderline_o_id,orderline_i_id | orderline_i_id | 5 | const
> | 10 | Using temporary; Using filesort |
> | 1 | PRIMARY | ol2 | ref |
> orderline_o_id,orderline_i_id | orderline_o_id | 8 |
> test.ol.OL_O_ID | 11 | Using where |
> | 1 | PRIMARY | <derived2> | ref | key0
> | key0 | 8 | test.ol.OL_O_ID | 10 |
> |
> | 2 | DERIVED | orders | index | NULL
> | orders_o_date | 4 | NULL | 10000 |
> |
> +------+-------------+------------+-------+-------------------------------+----------------+---------+-----------------+-------+---------------------------------+
table=<derived2>, key=key0 - apparently, this is MariaDB 5.3/5.5, and it's
using "derived table with keys"
(http://kb.askmonty.org/en/derived-table-with-key-optimization) optimization.
> The query plan from MariaDB looks sane to me, and the numbers approve
> this (the query runs on a middle sized data set about 200 times faster
> with MariaDB than with MySQL). So we will continue our work with
> MariaDB. But I have a question to these query plans: why are we
> getting this differences in MySQL between our storage engine and
> InnoDB? Is there a feature in our storage engine missing (we first
> thought we need the ability to support HA_KEYREAD_ONLY - but
> implementing this feature did not change the query plan)? Or does
> MySQL some kind of "cheating"? We should understand this issue to be
> able to present our results we get later (may be we will compare
> MariaDB and MySQL, but in a paper we would have to explain why MySQL
> sucks that much).
>
> And btw: good work with MariaDB!! The optimizer seems to do a much
> better job than MySQL - even with InnoDB/XtraDB (we had to rewrite
> some queries in MySQL to force it to generate sane query plans - with
> MariaDB this does not seem to be necessary anymore).
BR
Sergei
--
Sergei Petrunia, Software Developer
Monty Program AB, http://askmonty.org
Blog: http://s.petrunia.net/blog
Follow ups
References