maria-developers team mailing list archive

Thread
Date
Re: Fwd: [Commits] Rev 2901: BNLH algorithm always used a full table scan over the joined table in file:///home/igor/maria/maria-5.3-mwl128-hashrange/

To: Igor Babaev <igor@xxxxxxxxxxxx>
From: Sergey Petrunya <psergey@xxxxxxxxxxxx>
Date: Mon, 21 Feb 2011 23:30:08 +0300
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4D561161.5010502@askmonty.org>
User-agent: Mutt/1.5.18 (2008-05-17)
On Fri, Feb 11, 2011 at 08:49:37PM -0800, Igor Babaev wrote:
> Hi Sergey,
> 
> Please review this patch that fixes a defect of the current
> implementation of the mwl #128 that never couples hash join
> with range/index_merge scans.
> The patch also adds prefix #hash# to the names of the used hash indexes.
> (we agreed upon this at the last optimizer meeting).
> 
> Regards,
> Igor.
> 
> -------- Original Message --------
> Subject: [Commits] Rev 2901: BNLH algorithm always used a full table
> scan over the joined table in
> file:///home/igor/maria/maria-5.3-mwl128-hashrange/
> Date: Fri, 11 Feb 2011 20:41:23 -0800 (PST)
> From: Igor Babaev <igor@xxxxxxxxxxxx>
> Reply-To: maria-developers@xxxxxxxxxxxxxxxxxxx
> To: <commits@xxxxxxxxxxx>
> 
> At file:///home/igor/maria/maria-5.3-mwl128-hashrange/
> 
> ------------------------------------------------------------
> revno: 2901
> revision-id: igor@xxxxxxxxxxxx-20110212044122-w9n3jdk3d2ps0w3o
> parent: igor@xxxxxxxxxxxx-20110207231903-3rqbfs50d33lk3r9
> committer: Igor Babaev <igor@xxxxxxxxxxxx>
> branch nick: maria-5.3-mwl128-hashrange
> timestamp: Fri 2011-02-11 20:41:22 -0800
> message:
>   BNLH algorithm always used a full table scan over the joined table
>   even in the cases when there existed range/index-merge scans that
>   were cheaper than the full table scan.
>   This was a defect/bug of the implementation of mwl #128.
>   Now hash join can work not only with full table scan of the joined
>   table, but also with full index scan, range and index-merge scans.
>   Accordingly, in the cases when hash join is used the column 'type'
>   in the EXPLAINs can contain now 'hash_ALL', 'hash_index', 'hash_range'
>   and 'hash_index_merge'. If hash join is coupled with a range/index_merge
>   scan then the columns 'key' and 'key_len' contain info not only on
>   the used hash index, but also on the indexes used for the scan.
...
> === modified file 'sql/sql_select.cc'
> --- a/sql/sql_select.cc	2011-02-06 04:57:03 +0000
> +++ b/sql/sql_select.cc	2011-02-12 04:41:22 +0000
...
> @@ -8419,6 +8438,11 @@
>            sort_by_tab->type= JT_ALL;
>            sort_by_tab->read_first_record= join_init_read_record;
>          }
> +        else if (sort_by_tab->type == JT_HASH_NEXT)
> +        {
> +          sort_by_tab->type= JT_HASH;
> +          sort_by_tab->read_first_record= join_init_read_record;
> +        }
>        }
>        break;
>      }
I have a question only to this part of the patch: I don't understand how the
above code could work. 

The if statement above the one you've added checks if sort_by_tab was using
[ordered] index scan to produce records in orderer, and if yes, switches it to
a full scan (because use of join buffering will break ordering anyway).

I suppose the part you've added does something similar, but I totally fail to
understand what exactly it does.

Now, some comments on the new EXPLAIN output:


ISSUE1. 

Consider a query:
MariaDB [j1]> explain select * from t3, t4 where t3.col1=t4.col2;
+----+-------------+-------+----------+---------------+-----------+---------+------------+------+--------------------------------------------------+
| id | select_type | table | type     | possible_keys | key       | key_len | ref        | rows | Extra                                            |
+----+-------------+-------+----------+---------------+-----------+---------+------------+------+--------------------------------------------------+
|  1 | SIMPLE      | t3    | ALL      | NULL          | NULL      | NULL    | NULL       | 1000 | Using where                                      |
|  1 | SIMPLE      | t4    | hash_ALL | NULL          | #hash#$hj | 5       | j1.t3.col1 | 1000 | Using where; Using join buffer (flat, BNLH join) |
+----+-------------+-------+----------+---------------+-----------+---------+------------+------+--------------------------------------------------+
2 rows in set (0.00 sec)

AFAIU #hash#$hj means the hash key on the hashtable that's used for
hash join. I think the name is too convoluted.  The hashtable has only one
key, if we need to tell it from t4's indexes it would be sufficient to use
"#hashkey#", without the cryptic "$hj".

As for "#" characters: EXPLAIN already shows "internal" tables/indexes:

MariaDB [j1]> explain select * from (select * from t1) X;
+----+-------------+------------+------+---------------+------+---------+------+------+-------+
| id | select_type | table      | type | possible_keys | key  | key_len | ref  | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+------+------+-------+
|  1 | PRIMARY     | <derived2> | ALL  | NULL          | NULL | NULL    | NULL | 1000 |       |
|  2 | DERIVED     | t1         | ALL  | NULL          | NULL | NULL    | NULL | 1000 |       |
+----+-------------+------------+------+---------------+------+---------+------+------+-------+

# The following is from MWL#90 tree:
MariaDB [test]> explain select * from t10 where a in (select max(b) from t10 group by a);
+----+-------------+-------------+--------+---------------+--------------+---------+------------+------+---------------------------------+
| id | select_type | table       | type   | possible_keys | key          | key_len | ref        | rows | Extra                           |
+----+-------------+-------------+--------+---------------+--------------+---------+------------+------+---------------------------------+
|  1 | PRIMARY     | t10         | ALL    | NULL          | NULL         | NULL    | NULL       | 1000 | Using where                     |
|  1 | PRIMARY     | <subquery2> | eq_ref | distinct_key  | distinct_key | 5       | test.t10.a |    1 |                                 |
|  2 | SUBQUERY    | t10         | ALL    | NULL          | NULL         | NULL    | NULL       | 1000 | Using temporary; Using filesort |
+----+-------------+-------------+--------+---------------+--------------+---------+------------+------+---------------------------------+
3 rows in set (0.01 sec)

# The following is from current 5.3:
MariaDB [j1]> explain select distinct * from t1 where a in (select b from t2);
+----+-------------+------------+--------+---------------+------------+---------+------+------+-----------------+
| id | select_type | table      | type   | possible_keys | key        | key_len | ref  | rows | Extra           |
+----+-------------+------------+--------+---------------+------------+---------+------+------+-----------------+
|  1 | PRIMARY     | t1         | ALL    | NULL          | NULL       | NULL    | NULL | 1000 | Using temporary |
|  1 | PRIMARY     | subselect2 | eq_ref | unique_key    | unique_key | 5       | func |    1 |                 |
|  2 | SUBQUERY    | t2         | ALL    | NULL          | NULL       | NULL    | NULL | 1000 | Distinct        |
+----+-------------+------------+--------+---------------+------------+---------+------+------+-----------------+


... and it either uses no quoting at all (see "unique_key"), or angle brackets
quoting (see <subquery2>). I think we should not multiply the ways of quoting,
so please consider switching to <quoting> or to not using any quoting at all.

ISSUE2

MariaDB [j1]> EXPLAIN
    -> SELECT City.Name, Country.Name, CountryLanguage.Language
    -> FROM City,Country,CountryLanguage
    -> WHERE City.Country=Country.Code AND
    -> CountryLanguage.Country=Country.Code AND
    -> City.Name LIKE 'L%' AND Country.Population > 3000000 AND
    -> CountryLanguage.Percentage > 50 AND
    -> LENGTH(Language) < LENGTH(City.Name) - 2;
+----+-------------+-----------------+----------+--------------------+---------------+---------+----------------------------+------+--------------------------------------------------+
| id | select_type | table           | type     | possible_keys      | key           | key_len | ref                        | rows | Extra                                            |
+----+-------------+-----------------+----------+--------------------+---------------+---------+----------------------------+------+--------------------------------------------------+
|  1 | SIMPLE      | CountryLanguage | ALL      | PRIMARY,Percentage | NULL          | NULL    | NULL                       |  984 | Using where                                      |
|  1 | SIMPLE      | Country         | hash_ALL | PRIMARY            | #hash#PRIMARY | 3       | j1.CountryLanguage.Country |  239 | Using where; Using join buffer (flat, BNLH join) |
|  1 | SIMPLE      | City            | hash_ALL | Country            | #hash#Country | 3       | j1.CountryLanguage.Country | 4079 | Using where; Using join buffer (flat, BNLH join) |
+----+-------------+-----------------+----------+--------------------+---------------+---------+----------------------------+------+--------------------------------------------------+
3 rows in set (0.01 sec)

"key" column shows "#hash#PRIMARY". I suppose, this is because hash join opimizer is
internally hooked into ref optimizer, and it used the PRIMARY index.  This
might be useful for developer, but is this information meaningful for the user?
If one changes #hash#PRIMARY to #hash#IDX2  (suppose there is an index called
IDX2 which has the same definition as primary key), or to #hash#$hj, will it be
a different query plan?  I think it should be "<hashkey>" always.


ISSUE3
For hash_range, "key" column shows {hashkey}:{range_key}, which allows to
determine that which key was used for range access. For hash_index type, 
"key" column shows only the hash's pseudo-index: 

MariaDB [j3]> EXPLAIN SELECT t2.i FROM t1,t2 WHERE t1.cu = t2.cl;
+----+-------------+-------+------------+---------------+-----------+---------+------+------+--------------------------------------------------+
| id | select_type | table | type       | possible_keys | key       | key_len | ref  | rows | Extra                                            |
+----+-------------+-------+------------+---------------+-----------+---------+------+------+--------------------------------------------------+
|  1 | SIMPLE      | t2    | ALL        | NULL          | NULL      | NULL    | NULL |    6 |                                                  |
|  1 | SIMPLE      | t1    | hash_index | cu1,cu2       | #hash#cu1 | 33      | func |   10 | Using where; Using join buffer (flat, BNLH join) |
+----+-------------+-------+------------+---------------+-----------+---------+------+------+--------------------------------------------------+

What is the index that's used for scanning the table t1?  "#hash#cu1" shows
that hash join optimizer used access on index "cu1" in its analysis, but
generally it doesn't mean that full index scan on table t1 was done on index
cu1, doesn't it? 
I would very much prefer to see #hash#cu1:USED_INDEX there (or taking into
account previous suggestions, hashkey:USED_INDEX). It will also be
consistent with how hash_range and hash_index_merge use that column.

ISSUE4
If key and key_len use hash_part:real_part notation, i.e. use semicolon, why
use underscore in hash_ALL, hash_index, etc? Won't hash:ALL, hash:index look
more consistent with other columns?






BR
 Sergey
-- 
Sergey Petrunia, Software Developer
Monty Program AB, http://askmonty.org
Blog: http://s.petrunia.net/blog
Follow ups

Re: Fwd: [Commits] Rev 2901: BNLH algorithm always used a full table scan over the joined table in file:///home/igor/maria/maria-5.3-mwl128-hashrange/
From: Igor Babaev, 2011-02-21