← Back to team overview

maria-developers team mailing list archive

Re: 5e16a49f5d9: MDEV-26013 distinct not work properly in some cases for spider tables

 

Hi, Nayuta!

Ok to push.
Thanks!

On Jul 22, Nayuta Yanagisawa wrote:
> Hi Sergei!
> 
> > I agree, it is wrong that implicit GROUP BY has no place in the Query
> > structure. Does Spider work correctly for, say, `SELECT COUNT(*)` or
> > `SELECT SUM(a)` - that is, for other cases of such implicit GROUP BY?
> >
> > If it does, it likely deducts the need for GROUP BY when seeing COUNT()
> > or SUM() in the select list. Which doesn't work for converted DISTINCT.
> 
> The Spider works correctly for COUNT and SUM. For example,  the optimizer
> keeps group_list not NULL for the following query. So, it works.
> 
> SELECT COUNT(*) FROM spider_tab WHERE col2=999;
> 
> > In that case, I think your fix is good. One question: why `no_order &&` ?
> 
> In fact, 'no_order &&' seems to be redundant. Thank you!
> `select_distinct || group_optimized_away` would be enough.
> 
> I removed the redundant condition from the patch and push to bb-10.4-
> mdev-26013.
> https://github.com/MariaDB/server/commit/ee59187906d87a71106080d1c5728f32cdf59382 ;
> 
> Please take another look at it.
> 
> Best regards,
> Nayuta
> 
> On July 22, 2021, Sergei Golubchik <serg@xxxxxxxxxxx> wrote:
> > Hi, Nayuta!
> >
> > On Jul 21, Nayuta Yanagisawa wrote:
> > > Hi Sergei,
> > > 
> > > Thank you for your review!
> > > 
> > > > If DISTINCT was coverted to a GROUP BY, why would the engine need to
> > > > know whether there was DISTINCT or not originally? There is no
> > > > DISTINCT on the execution plan now, that should be sufficient,
> > > > shouldn't it?
> > > > Why does the query fail with GROUP BY?
> > > 
> > > For the select query in the test case (SELECT distinct b FROM tbl_a WHERE b=999),
> > > the optimizer seems to convert DISTINCT to GROUP by and then optimize away
> > > GROUP BY. The, we get select_distinct = 0, no_order = 1, group_optimized_away = 1.
> > > Please see sql/sql_select.cc:2721-2781.
> > > 
> > > In such a case, group_list is NULL and thus the Spider SE misunderstand that
> > > the query has neither DISTINCT and GROUP BY without my fix.
> >
> > Oh, I see. group_list is NULL, I missed that.
> > It's
> >
> >  select distinct b from tbl_a where b=999;
> >
> > see, it asks for distinct values of `b` for b=999. There can be only one
> > row in this query. Or none. So the optimizer changes DISTINCT to an
> > implicit GROUP BY, similar to `select count(*) from tbl_a` -
> > such an implicit GROUP BY can return at most one row.
> >
> > I agree, it is wrong that implicit GROUP BY has no place in the Query
> > structure. Does Spider work correctly for, say, `SELECT COUNT(*)` or
> > `SELECT SUM(a)` - that is, for other cases of such implicit GROUP BY?
> >
> > If it does, it likely deducts the need for GROUP BY when seeing COUNT()
> > or SUM() in the select list. Which doesn't work for converted DISTINCT.
> >
> > In that case, I think your fix is good. One question: why `no_order
> > &&` ?
> 
Regards,
Sergei
VP of MariaDB Server Engineering
and security@xxxxxxxxxxx


References