maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #04480
Re: Proposal: change SUBQUERY to MATERIALIZE
On Thu, Dec 01, 2011 at 01:12:00PM +0200, Timour Katchaounov wrote:
> Ok, I agree to change to MATERIALIZED (not MATERIALIZE).
> Who will do the change?
>
> In addition, in the EXTRA field there should be info
> which materialization is used. If I am not mistaken,
> there are three variants:
> - SJ-mat,
> - SJ-inside-out,
> - NON-SJ-mat
> (better terms are needed for explain).
I do not fully agree to this.
- One can easily tell between semi-join and non-semi-join materialization: in
semi-join case you see a separate line with table='<subquery2>'
- For semi-join materialization, lookup and scan variants are easily visible.
Check the next two explains: the first does eq_ref access to <subquery2>
table, this is SJ-Materialization-Lookup. The second does a full scan on
<subquery2>, this is SJ-Materialization-Scan.
I thought this was apprent enough. Do you think it is not?
MariaDB [test]> explain select * from one_k A where a in (select B.a from ten B, ten C);
+----+-------------+-------------+--------+---------------+--------------+---------+------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+--------+---------------+--------------+---------+------+------+-------+
| 1 | PRIMARY | A | ALL | NULL | NULL | NULL | NULL | 1000 | |
| 1 | PRIMARY | <subquery2> | eq_ref | distinct_key | distinct_key | 5 | func | 1 | |
| 2 | SUBQUERY | B | ALL | NULL | NULL | NULL | NULL | 10 | |
| 2 | SUBQUERY | C | ALL | NULL | NULL | NULL | NULL | 10 | |
+----+-------------+-------------+--------+---------------+--------------+---------+------+------+-------+
4 rows in set (0.00 sec)
MariaDB [test]> set debug_optimizer_prefer_join_prefix='B,C,A';
Query OK, 0 rows affected (0.00 sec)
MariaDB [test]> explain select * from one_k A where a in (select B.a from ten B, ten C);
+----+-------------+-------------+------+---------------+------+---------+------+------+-------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+------+---------------+------+---------+------+------+-------------------------------------------------+
| 1 | PRIMARY | <subquery2> | ALL | distinct_key | NULL | NULL | NULL | 10 | |
| 1 | PRIMARY | A | ALL | NULL | NULL | NULL | NULL | 1000 | Using where; Using join buffer (flat, BNL join) |
| 2 | SUBQUERY | B | ALL | NULL | NULL | NULL | NULL | 10 | |
| 2 | SUBQUERY | C | ALL | NULL | NULL | NULL | NULL | 10 | |
+----+-------------+-------------+------+---------------+------+---------+------+------+-------------------------------------------------+
4 rows in set (0.00 sec)
>
> Timour
>
> On 1.12.2011 12:54, Sergei Petrunia wrote:
>> Reply-To:
>>
>> Hello,
>>
>> I think we have already discussed this before (multiple times), and the
>> idea was received with some conditional approval of the idea. Nothing has
>> been done or put to paper, though, so now I'm posting this in writing.
>>
>> I'm posting this now, because we're close to RC and I think it's better to do
>> this kind of changes before the RC.
>>
>> The proposal is: if a subquery is materialized (that is, the subquery is run
>> and it's output is stored with a temporary table with unique key), EXPLAIN
>> should not show "SUBQUERY" (like it does now), it should show "MATERIALIZE".
>>
>> Consider the following two queries. The EXPLAINs are the same, although one of
>> them runs materialization and the other does not:
>>
>> MariaDB [test]> EXPLAIN select * from ten where a< any (select max(a) from ten);
>> +----+-------------+-------+------+---------------+------+---------+------+------+-------------+
>> | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
>> +----+-------------+-------+------+---------------+------+---------+------+------+-------------+
>> | 1 | PRIMARY | ten | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
>> | 2 | SUBQUERY | ten | ALL | NULL | NULL | NULL | NULL | 10 | |
>> +----+-------------+-------+------+---------------+------+---------+------+------+-------------+
>> 2 rows in set (0.00 sec)
>>
>> MariaDB [test]> EXPLAIN select * from ten where a = any (select max(a) from ten) or a< 3;
>> +----+-------------+-------+------+---------------+------+---------+------+------+-------------+
>> | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
>> +----+-------------+-------+------+---------------+------+---------+------+------+-------------+
>> | 1 | PRIMARY | ten | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
>> | 2 | SUBQUERY | ten | ALL | NULL | NULL | NULL | NULL | 10 | |
>> +----+-------------+-------+------+---------------+------+---------+------+------+-------------+
>> 2 rows in set (0.00 sec)
>>
>> You can only tell that the second one uses Materialization if look at EXPLAIN
>> EXTENDED warning text. I think it is extremely bad that radically new
>> execution strategy lurks somewhere inside EXPLAIN EXTENDED.
>>
>>
>> I remember you've used to argue that 'SUBUQERY' represents the fact that the
>> subquqery is evaluated once, and materialization should be shown elsewhere. I
>> object to that argument
>> - materialization is currently shown *nowhere* (except EXPLAIN EXTENDED which
>> is pain to read except for the most simplest queries)
>> - Derived table subqueries, which are also executed once (and stored in a
>> temporary table, like with materialization), have their special select_type
>> value 'DERIVED':
>>
>> MariaDB [test]> explain select * from one_k, (select max(A.a+B.a) from ten A, ten B ) foo;
>> +----+-------------+------------+------+---------------+------+---------+------+------+------------------------------------+
>> | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
>> +----+-------------+------------+------+---------------+------+---------+------+------+------------------------------------+
>> | 1 | PRIMARY |<derived2> | ALL | NULL | NULL | NULL | NULL | 100 | |
>> | 1 | PRIMARY | one_k | ALL | NULL | NULL | NULL | NULL | 1000 | Using join buffer (flat, BNL join) |
>> | 2 | DERIVED | A | ALL | NULL | NULL | NULL | NULL | 10 | |
>> | 2 | DERIVED | B | ALL | NULL | NULL | NULL | NULL | 10 | Using join buffer (flat, BNL join) |
>> +----+-------------+------------+------+---------------+------+---------+------+------+------------------------------------+
>> 4 rows in set (0.02 sec)
>>
>>
>> I think the above should be sufficient to make a decision to change
>> Materialized subqueries to show 'MATERIALIZED' (or 'MATERIALIZE') in EXPLAIN
>> output. Any objections?
>>
>>
>>
>>
--
BR
Sergei
--
Sergei Petrunia, Software Developer
Monty Program AB, http://askmonty.org
Blog: http://s.petrunia.net/blog
References