maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #03931
Re: Fwd: [Commits] Rev 2901: BNLH algorithm always used a full table scan over the joined table in file:///home/igor/maria/maria-5.3-mwl128-hashrange/
On Mon, Feb 21, 2011 at 03:20:13PM -0800, Igor Babaev wrote:
> >> === modified file 'sql/sql_select.cc'
> >> --- a/sql/sql_select.cc 2011-02-06 04:57:03 +0000
> >> +++ b/sql/sql_select.cc 2011-02-12 04:41:22 +0000
> > ...
> >> @@ -8419,6 +8438,11 @@
> >> sort_by_tab->type= JT_ALL;
> >> sort_by_tab->read_first_record= join_init_read_record;
> >> }
> >> + else if (sort_by_tab->type == JT_HASH_NEXT)
> >> + {
> >> + sort_by_tab->type= JT_HASH;
> >> + sort_by_tab->read_first_record= join_init_read_record;
> >> + }
> >> }
> >> break;
> >> }
> > I have a question only to this part of the patch: I don't understand how the
> > above code could work.
> >
> > The if statement above the one you've added checks if sort_by_tab was using
> > [ordered] index scan to produce records in orderer, and if yes, switches it to
> > a full scan (because use of join buffering will break ordering anyway).
> >
> > I suppose the part you've added does something similar, but I totally fail to
> > understand what exactly it does.
>
> The above code says:
> if hash join is used do not use full index scan to look through the
> joined table, rather use full table scan in this case.
>
I've put a DBUG_ASSERT(0) right after the "sort_by_tab->type= JT_HASH;" line
and ran the testsuite. The assertion didn't fire. I did this because I suspect
(although can't provide a sound proof right away) that the part inside the
else if () { ...} is deadcode.
Do you think you could easily prove me wrong by giving a testcase? If
constructing testcase is complicated, let's meet online and discuss my
suspicions about the deadcode.
> >
> > Now, some comments on the new EXPLAIN output:
> >
> >
> > ISSUE1.
> >
> > Consider a query:
> > MariaDB [j1]> explain select * from t3, t4 where t3.col1=t4.col2;
> > +----+-------------+-------+----------+---------------+-----------+---------+------------+------+--------------------------------------------------+
> > | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
> > +----+-------------+-------+----------+---------------+-----------+---------+------------+------+--------------------------------------------------+
> > | 1 | SIMPLE | t3 | ALL | NULL | NULL | NULL | NULL | 1000 | Using where |
> > | 1 | SIMPLE | t4 | hash_ALL | NULL | #hash#$hj | 5 | j1.t3.col1 | 1000 | Using where; Using join buffer (flat, BNLH join) |
> > +----+-------------+-------+----------+---------------+-----------+---------+------------+------+--------------------------------------------------+
> > 2 rows in set (0.00 sec)
> >
> > AFAIU #hash#$hj means the hash key on the hashtable that's used for
> > hash join. I think the name is too convoluted. The hashtable has only one
> > key, if we need to tell it from t4's indexes it would be sufficient to use
> > "#hashkey#", without the cryptic "$hj".
> >
> > As for "#" characters: EXPLAIN already shows "internal" tables/indexes:
>
> If you look into #maria-call log you'll see that I followed strictly
> Monty's instructions:
> to use the prefix #hash# before the index name I used to use, i.e.
> <used_index> should be substituted for #hash#<used_index>.
>
> The only deviation I afforded myself was: I used $hj instead of hj_key.
>
Ok. I still think that my objections are valid and current EXPLAIN output is
not the best but we could discuss/address this outside the scope of this
bugfix.
<skip>
> > What is the index that's used for scanning the table t1? "#hash#cu1" shows
> > that hash join optimizer used access on index "cu1" in its analysis, but
> > generally it doesn't mean that full index scan on table t1 was done on index
> > cu1, doesn't it?
> > I would very much prefer to see #hash#cu1:USED_INDEX there (or taking into
> > account previous suggestions, hashkey:USED_INDEX). It will also be
> > consistent with how hash_range and hash_index_merge use that column.
> >
> Yes, the scan index is missing here. I will add.
OK.
> > ISSUE4
> > If key and key_len use hash_part:real_part notation, i.e. use semicolon, why
> > use underscore in hash_ALL, hash_index, etc? Won't hash:ALL, hash:index look
> > more consistent with other columns?
>
> again: I follow here Monty's recommendations. Nothing more, nor less.
>
BR
Sergey
--
Sergey Petrunia, Software Developer
Monty Program AB, http://askmonty.org
Blog: http://s.petrunia.net/blog
References