← Back to team overview

maria-developers team mailing list archive

Re: 5e16a49f5d9: MDEV-26013 distinct not work properly in some cases for spider tables

 

Hi Sergei,

Thank you for your review!

> If DISTINCT was coverted to a GROUP BY, why would the engine need to
> know whether there was DISTINCT or not originally? There is no
DISTINCT
> on the execution plan now, that should be sufficient, shouldn't it?
> 
> Why does the query fail with GROUP BY?

For the select query in the test case (SELECT distinct b FROM tbl_a
WHERE b=999),
the optimizer seems to convert DISTINCT to GROUP by and then optimize
away
GROUP BY. The, we get select_distinct = 0, no_order = 1,
group_optimized_away = 1.
Please see sql/sql_select.cc:2721-2781.

In such a case, group_list is NULL and thus the Spider SE misunderstand
that
the query has neither DISTINCT and GROUP BY without my fix.

Optimizing group_list away itself might be a bug but there seems no
problem with
other storage engines.

Regards,
Nayuta

On July 22, 2021, Sergei Golubchik <serg@xxxxxxxxxxx> wrote:
> Hi, Nayuta!
>
> On Jul 21, Nayuta Yanagisawa wrote:
> > revision-id: 5e16a49f5d9 (mariadb-10.4.20-32-g5e16a49f5d9)
> > parent(s): 78735dcaf75
> > author: Nayuta Yanagisawa
> > committer: Nayuta Yanagisawa
> > timestamp: 2021-07-21 09:58:34 +0000
> > message:
> > 
> > MDEV-26013 distinct not work properly in some cases for spider
> tables
> > 
> > diff --git a/sql/sql_select.cc b/sql/sql_select.cc
> > index d8116b20cbe..cfd04d93662 100644
> > --- a/sql/sql_select.cc
> > +++ b/sql/sql_select.cc
> > @@ -3274,9 +3274,17 @@ bool JOIN::make_aggr_tables_info()
> > 
> > if (ht && ht->create_group_by)
> > {
> > - /* Check if the storage engine can intercept the query */
> > - Query query= {&all_fields, select_distinct, tables_list, conds,
> > - group_list, order ? order : group_list, having};
> > + /*
> > + Check if the storage engine can intercept the query.
> > +
> > + JOIN::optimize_stage2() might convert DISTINCT into GROUP BY and
> then
> > + optimize away GROUP BY. In such a case, we need to notify a
> storage engine
> > + supporting a group by handler of the existence of the original
> DISTINCT.
> > + Thus, we set select_distinct || (no_order && group_optimized_away)
> to
> > + Query::distinct.
> > + */
> > + Query query= {&all_fields, select_distinct || (no_order &&
> group_optimized_away),
> > + tables_list, conds, group_list, order ? order : group_list,
> having};
>
> If DISTINCT was coverted to a GROUP BY, why would the engine need to
> know whether there was DISTINCT or not originally? There is no
> DISTINCT
> on the execution plan now, that should be sufficient, shouldn't it?
>
> Why does the query fail with GROUP BY?
>
> > group_by_handler *gbh= ht->create_group_by(thd, &query);
> > 
> > if (gbh)
>
> Regards,
> Sergei
> VP of MariaDB Server Engineering
> and security@xxxxxxxxxxx

Follow ups

References