← Back to team overview

maria-developers team mailing list archive

Re: Proposal: change SUBQUERY to MATERIALIZE

 

On Thu, Dec 01, 2011 at 01:12:00PM +0200, Timour Katchaounov wrote:
> Ok, I agree to change to MATERIALIZED (not MATERIALIZE).
> Who will do the change?
>
> In addition, in the EXTRA field there should be info
> which materialization is used. If I am not mistaken,
> there are three variants:
> - SJ-mat,
> - SJ-inside-out,
> - NON-SJ-mat
> (better terms are needed for explain).

I do not fully agree to this.

- One can easily tell between semi-join and non-semi-join materialization: in
  semi-join case you see a separate line with table='<subquery2>'

- For semi-join materialization, lookup and scan variants are easily visible.
  Check the next two explains: the first does eq_ref access to <subquery2>
  table, this is SJ-Materialization-Lookup. The second does a full scan on
  <subquery2>, this is SJ-Materialization-Scan.

  I thought this was apprent enough. Do you think it is not?


MariaDB [test]> explain select * from one_k A where a in (select B.a from  ten B, ten C);
+----+-------------+-------------+--------+---------------+--------------+---------+------+------+-------+
| id | select_type | table       | type   | possible_keys | key          | key_len | ref  | rows | Extra |
+----+-------------+-------------+--------+---------------+--------------+---------+------+------+-------+
|  1 | PRIMARY     | A           | ALL    | NULL          | NULL         | NULL    | NULL | 1000 |       |
|  1 | PRIMARY     | <subquery2> | eq_ref | distinct_key  | distinct_key | 5       | func |    1 |       |
|  2 | SUBQUERY    | B           | ALL    | NULL          | NULL         | NULL    | NULL |   10 |       |
|  2 | SUBQUERY    | C           | ALL    | NULL          | NULL         | NULL    | NULL |   10 |       |
+----+-------------+-------------+--------+---------------+--------------+---------+------+------+-------+
4 rows in set (0.00 sec)

MariaDB [test]> set debug_optimizer_prefer_join_prefix='B,C,A';
Query OK, 0 rows affected (0.00 sec)

MariaDB [test]> explain select * from one_k A where a in (select B.a from  ten B, ten C);
+----+-------------+-------------+------+---------------+------+---------+------+------+-------------------------------------------------+
| id | select_type | table       | type | possible_keys | key  | key_len | ref  | rows | Extra                                           |
+----+-------------+-------------+------+---------------+------+---------+------+------+-------------------------------------------------+
|  1 | PRIMARY     | <subquery2> | ALL  | distinct_key  | NULL | NULL    | NULL |   10 |                                                 |
|  1 | PRIMARY     | A           | ALL  | NULL          | NULL | NULL    | NULL | 1000 | Using where; Using join buffer (flat, BNL join) |
|  2 | SUBQUERY    | B           | ALL  | NULL          | NULL | NULL    | NULL |   10 |                                                 |
|  2 | SUBQUERY    | C           | ALL  | NULL          | NULL | NULL    | NULL |   10 |                                                 |
+----+-------------+-------------+------+---------------+------+---------+------+------+-------------------------------------------------+
4 rows in set (0.00 sec)




>
> Timour
>
> On  1.12.2011 12:54, Sergei Petrunia wrote:
>> Reply-To:
>>
>> Hello,
>>
>> I think we have already discussed this before (multiple times), and the
>> idea was received with some conditional approval of the idea. Nothing has
>> been done or put to paper, though, so now I'm posting this in writing.
>>
>> I'm posting this now, because we're close to RC and I think it's better to do
>> this kind of changes before the RC.
>>
>> The proposal is: if a subquery is materialized (that is, the subquery is run
>> and it's output is stored with a temporary table with unique key), EXPLAIN
>> should not show "SUBQUERY" (like it does now), it should show "MATERIALIZE".
>>
>> Consider the following two queries. The EXPLAINs are the same, although one of
>> them runs materialization and the other does not:
>>
>> MariaDB [test]>  EXPLAIN select * from ten where a<  any (select max(a) from ten);
>> +----+-------------+-------+------+---------------+------+---------+------+------+-------------+
>> | id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra       |
>> +----+-------------+-------+------+---------------+------+---------+------+------+-------------+
>> |  1 | PRIMARY     | ten   | ALL  | NULL          | NULL | NULL    | NULL |   10 | Using where |
>> |  2 | SUBQUERY    | ten   | ALL  | NULL          | NULL | NULL    | NULL |   10 |             |
>> +----+-------------+-------+------+---------------+------+---------+------+------+-------------+
>> 2 rows in set (0.00 sec)
>>
>> MariaDB [test]>  EXPLAIN select * from ten where a = any (select max(a) from ten) or a<  3;
>> +----+-------------+-------+------+---------------+------+---------+------+------+-------------+
>> | id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra       |
>> +----+-------------+-------+------+---------------+------+---------+------+------+-------------+
>> |  1 | PRIMARY     | ten   | ALL  | NULL          | NULL | NULL    | NULL |   10 | Using where |
>> |  2 | SUBQUERY    | ten   | ALL  | NULL          | NULL | NULL    | NULL |   10 |             |
>> +----+-------------+-------+------+---------------+------+---------+------+------+-------------+
>> 2 rows in set (0.00 sec)
>>
>> You can only tell that the second one uses Materialization if look at EXPLAIN
>> EXTENDED warning text.  I think it is extremely bad that radically new
>> execution strategy lurks somewhere inside EXPLAIN EXTENDED.
>>
>>
>> I remember you've used to argue that 'SUBUQERY' represents the fact that the
>> subquqery is evaluated once, and materialization should be shown elsewhere. I
>> object to that argument
>> - materialization is currently shown *nowhere* (except EXPLAIN EXTENDED which
>>    is pain to read except for the most simplest queries)
>> - Derived table subqueries, which are also executed once (and stored in a
>>    temporary table, like with materialization), have their special select_type
>>    value 'DERIVED':
>>
>> MariaDB [test]>  explain select * from one_k, (select max(A.a+B.a) from ten A, ten B ) foo;
>> +----+-------------+------------+------+---------------+------+---------+------+------+------------------------------------+
>> | id | select_type | table      | type | possible_keys | key  | key_len | ref  | rows | Extra                              |
>> +----+-------------+------------+------+---------------+------+---------+------+------+------------------------------------+
>> |  1 | PRIMARY     |<derived2>  | ALL  | NULL          | NULL | NULL    | NULL |  100 |                                    |
>> |  1 | PRIMARY     | one_k      | ALL  | NULL          | NULL | NULL    | NULL | 1000 | Using join buffer (flat, BNL join) |
>> |  2 | DERIVED     | A          | ALL  | NULL          | NULL | NULL    | NULL |   10 |                                    |
>> |  2 | DERIVED     | B          | ALL  | NULL          | NULL | NULL    | NULL |   10 | Using join buffer (flat, BNL join) |
>> +----+-------------+------------+------+---------------+------+---------+------+------+------------------------------------+
>> 4 rows in set (0.02 sec)
>>
>>
>> I think the above should be sufficient to make a decision to change
>> Materialized subqueries to show 'MATERIALIZED' (or 'MATERIALIZE') in EXPLAIN
>> output.  Any objections?
>>
>>
>>
>>

-- 
BR
 Sergei
-- 
Sergei Petrunia, Software Developer
Monty Program AB, http://askmonty.org
Blog: http://s.petrunia.net/blog


References