maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #04702
Re: Question about Query plan
Hi Sergei,
Thanks a lot for your answer, this was indeed extremely helpful!
> - Do you really records_in_range() implementation like in (*)
Yes.
> - Are estimates of 10 and 20 rows close to reality?
Not at all... The thing is: we can implement records_in_range not as a
function with constant complexity and since a full table scan is not
faster than a index range lookup in our storage engine (even if the
whole table is contained in the range), we wanted to force MySQL to
make always an index lookup. If I understand everything correctly, my
error in thinking was, that I thought MySQL uses the information
returned by records_in_range only to chose between index lookup and
fulltable scan (since on a traditional disk architecture a full table
scan will be faster than an index scan which scans the whole table).
But this is obviously not the case.
So we will have to implemented this records_in_range method. Do I
understand this correct, that a logarithmic (with respect to the table
size) run time will be ok for this method (InnoDB seems to walk two
times through the index if I get the comments in the code correct)?
Thanks again and best regards
Markus
On Mon, May 14, 2012 at 11:20 AM, Sergei Petrunia <psergey@xxxxxxxxxxxx> wrote:
> On Mon, May 14, 2012 at 08:40:59AM +0200, Markus Pilman wrote:
>> Hello all,
>>
>> At ETH Zurich we are working on a new storage engine, that allows us
>> to test several new architectures for transactional databases. So far
>> we worked with MySQL, but we had massive performance issues. After
>> some investigation we figured out, that MySQL generates different
>> query plans for InnoDB than for our engine. One query which killed our
>> performance was the following (this is a query from the TPC-W
>> benchmark):
>>
>> SELECT ol2.ol_i_id, SUM(ol2.ol_qty) AS sum_ol
>> FROM order_line ol, order_line ol2, (SELECT o_id FROM orders ORDER BY
>> o_date DESC LIMIT 10000) AS t
>> WHERE ol.ol_o_id = t.o_id AND ol.ol_i_id = 10 AND ol2.ol_o_id = t.o_id
>> AND ol2.ol_i_id <> 10
>> GROUP BY ol2.ol_i_id ORDER BY sum_ol DESC LIMIT 0,5
>>
>> MySQL generated the following plan for InnoDB:
>>
>> +----+-------------+------------+-------+-------------------------------+----------------+---------+-----------------+-------+---------------------------------+
>> | id | select_type | table | type | possible_keys
>> | key | key_len | ref | rows | Extra
>> |
>> +----+-------------+------------+-------+-------------------------------+----------------+---------+-----------------+-------+---------------------------------+
>> | 1 | PRIMARY | <derived2> | ALL | NULL
>> | NULL | NULL | NULL | 10000 | Using
>> temporary; Using filesort |
>> | 1 | PRIMARY | ol | ref |
>> orderline_o_id,orderline_i_id | orderline_o_id | 8 | t.o_id
>> | 1 | Using where |
>> | 1 | PRIMARY | ol2 | ref |
>> orderline_o_id,orderline_i_id | orderline_o_id | 8 |
>> tpcw.ol.OL_O_ID | 1 | Using where |
>> | 2 | DERIVED | orders | index | NULL
>> | orders_o_date | 4 | NULL | 10000 | Using index
>> |
>> +----+-------------+------------+-------+-------------------------------+----------------+---------+-----------------+-------+---------------------------------+
>>
>> while it generated the following one for our storage engine:
>>
>> +----+-------------+------------+-------+-------------------------------+----------------+---------+-------+-------+----------------------------------------------+
>> | id | select_type | table | type | possible_keys
>> | key | key_len | ref | rows | Extra
>> |
>> +----+-------------+------------+-------+-------------------------------+----------------+---------+-------+-------+----------------------------------------------+
>> | 1 | PRIMARY | ol | ref |
>> orderline_o_id,orderline_i_id | orderline_i_id | 5 | const |
>> 10 | Using where; Using temporary; Using filesort |
>> | 1 | PRIMARY | ol2 | range |
>> orderline_o_id,orderline_i_id | orderline_i_id | 5 | NULL |
>> 20 | Using where; Using join buffer |
>> | 1 | PRIMARY | <derived2> | ALL | NULL
>> | NULL | NULL | NULL | 10000 | Using where; Using join
>> buffer |
>> | 2 | DERIVED | orders | index | NULL
>> | orders_o_date | 4 | NULL | 10000 |
>> |
>> +----+-------------+------------+-------+-------------------------------+----------------+---------+-------+-------+----------------------------------------------+
> It's difficult to make guesses just on EXPLAINs, but I will try: we see
> {table=ol, rows=10}, {table=ol2, rows=20} -- this makes me suspect that
> your storage engine has this function (I recall some skeleton storage engine
> implementation had this):
>
> ha_rows records_in_range(...) { return 10; } (*)
>
> The WHERE clause has:
> - ol.ol_i_id = 10 -- this producess ref acess on table `ol`, and the estimate
> of 10 rows comes from records_in_range() call.
>
> - ol2.ol_i_id <> 10 -- the optimizer converts this to
>
> (-inf < ol2.ol_i_id <10) OR ( 10< ol2.ol_i_id < +inf)
>
> which is two ranges and we get an estimate of 20 rows.
>
> Questions:
> - Do you really records_in_range() implementation like in (*)
>
> - Are estimates of 10 and 20 rows close to reality?
>
>> The second one is obviously a very bad one. So we decided to try with
>> MariaDB, which generates the following query plan:
>>
>> +------+-------------+------------+-------+-------------------------------+----------------+---------+-----------------+-------+---------------------------------+
>> | id | select_type | table | type | possible_keys
>> | key | key_len | ref | rows | Extra
>> |
>> +------+-------------+------------+-------+-------------------------------+----------------+---------+-----------------+-------+---------------------------------+
>> | 1 | PRIMARY | ol | ref |
>> orderline_o_id,orderline_i_id | orderline_i_id | 5 | const
>> | 10 | Using temporary; Using filesort |
>> | 1 | PRIMARY | ol2 | ref |
>> orderline_o_id,orderline_i_id | orderline_o_id | 8 |
>> test.ol.OL_O_ID | 11 | Using where |
>> | 1 | PRIMARY | <derived2> | ref | key0
>> | key0 | 8 | test.ol.OL_O_ID | 10 |
>> |
>> | 2 | DERIVED | orders | index | NULL
>> | orders_o_date | 4 | NULL | 10000 |
>> |
>> +------+-------------+------------+-------+-------------------------------+----------------+---------+-----------------+-------+---------------------------------+
> table=<derived2>, key=key0 - apparently, this is MariaDB 5.3/5.5, and it's
> using "derived table with keys"
> (http://kb.askmonty.org/en/derived-table-with-key-optimization) optimization.
>
>
>> The query plan from MariaDB looks sane to me, and the numbers approve
>> this (the query runs on a middle sized data set about 200 times faster
>> with MariaDB than with MySQL). So we will continue our work with
>> MariaDB. But I have a question to these query plans: why are we
>> getting this differences in MySQL between our storage engine and
>> InnoDB? Is there a feature in our storage engine missing (we first
>> thought we need the ability to support HA_KEYREAD_ONLY - but
>> implementing this feature did not change the query plan)? Or does
>> MySQL some kind of "cheating"? We should understand this issue to be
>> able to present our results we get later (may be we will compare
>> MariaDB and MySQL, but in a paper we would have to explain why MySQL
>> sucks that much).
>>
>> And btw: good work with MariaDB!! The optimizer seems to do a much
>> better job than MySQL - even with InnoDB/XtraDB (we had to rewrite
>> some queries in MySQL to force it to generate sane query plans - with
>> MariaDB this does not seem to be necessary anymore).
>
> BR
> Sergei
> --
> Sergei Petrunia, Software Developer
> Monty Program AB, http://askmonty.org
> Blog: http://s.petrunia.net/blog
Follow ups
References