← Back to team overview

maria-developers team mailing list archive

Re: 5e16a49f5d9: MDEV-26013 distinct not work properly in some cases for spider tables

 

Hi Sergei!

> I agree, it is wrong that implicit GROUP BY has no place in the Query
> structure. Does Spider work correctly for, say, `SELECT COUNT(*)` or
> `SELECT SUM(a)` - that is, for other cases of such implicit GROUP BY?
>
> If it does, it likely deducts the need for GROUP BY when seeing
COUNT()
> or SUM() in the select list. Which doesn't work for converted
DISTINCT.

The Spider works correctly for COUNT and SUM. For example,  the
optimizer
keeps group_list not NULL for the following query. So, it works.

SELECT COUNT(*) FROM spider_tab WHERE col2=999;

> In that case, I think your fix is good. One question: why `no_order
&&` ?

In fact, 'no_order &&' seems to be redundant. Thank you!
`select_distinct || group_optimized_away` would be enough.

I removed the redundant condition from the patch and push to bb-10.4-
mdev-26013.
https://github.com/MariaDB/server/commit/ee59187906d87a71106080d1c5728f32cdf59382 ;

Please take another look at it.

Best regards,
Nayuta

On July 22, 2021, Sergei Golubchik <serg@xxxxxxxxxxx> wrote:
> Hi, Nayuta!
>
> On Jul 21, Nayuta Yanagisawa wrote:
> > Hi Sergei,
> > 
> > Thank you for your review!
> > 
> > > If DISTINCT was coverted to a GROUP BY, why would the engine need
> to
> > > know whether there was DISTINCT or not originally? There is no
> > > DISTINCT on the execution plan now, that should be sufficient,
> > > shouldn't it?
> > > Why does the query fail with GROUP BY?
> > 
> > For the select query in the test case (SELECT distinct b FROM tbl_a
> WHERE b=999),
> > the optimizer seems to convert DISTINCT to GROUP by and then
> optimize away
> > GROUP BY. The, we get select_distinct = 0, no_order = 1,
> group_optimized_away = 1.
> > Please see sql/sql_select.cc:2721-2781.
> > 
> > In such a case, group_list is NULL and thus the Spider SE
> misunderstand that
> > the query has neither DISTINCT and GROUP BY without my fix.
>
> Oh, I see. group_list is NULL, I missed that.
> It's
>
>  select distinct b from tbl_a where b=999;
>
> see, it asks for distinct values of `b` for b=999. There can be only
> one
> row in this query. Or none. So the optimizer changes DISTINCT to an
> implicit GROUP BY, similar to `select count(*) from tbl_a` -
> such an implicit GROUP BY can return at most one row.
>
> I agree, it is wrong that implicit GROUP BY has no place in the Query
> structure. Does Spider work correctly for, say, `SELECT COUNT(*)` or
> `SELECT SUM(a)` - that is, for other cases of such implicit GROUP BY?
>
> If it does, it likely deducts the need for GROUP BY when seeing
> COUNT()
> or SUM() in the select list. Which doesn't work for converted
> DISTINCT.
>
> In that case, I think your fix is good. One question: why `no_order
> &&` ?
>
> Regards,
> Sergei
> VP of MariaDB Server Engineering
> and security@xxxxxxxxxxx

Follow ups

References