maria-developers team mailing list archive

Thread
Date
Re: [Commits] 0f35480f3d6: MDEV-6707: Wrong result (extra row) with group by, multi-part key

To: maria-developers@xxxxxxxxxxxxxxxxxxx, Varun <varunraiko1803@xxxxxxxxx>
From: Sergey Petrunia <sergey@xxxxxxxxxxx>
Date: Tue, 20 Mar 2018 16:16:51 +0300
Cc: commits@xxxxxxxxxxx
In-reply-to: <20180118193721.0E6C51014C4900@Varuns-MacBook-Pro.local>
User-agent: Mutt/1.5.21 (2010-09-15)
Hi Varun,

Nice to see feedback for the first review fixed, but there's more to do. Please
find more input below.

On Fri, Jan 19, 2018 at 01:07:20AM +0530, Varun wrote:
> revision-id: 0f35480f3d60fac91bf1f91f8140ac5f55724139 (mariadb-5.5.56-142-g0f35480f3d6)
> parent(s): fafdac3365f4943e73bcefd0e0d07d69997a9724
> author: Varun Gupta
> committer: Varun Gupta
> timestamp: 2018-01-18 23:53:35 +0530
> message:
> 
> MDEV-6707: Wrong result (extra row) with group by, multi-part key
> 
> This case involves using a composite key,few parts of which are involved in GROUP BY and few in the MIN/MAX
> functions in the select list.
> Ranges are created in accordance with the where condition, so during the execution of such queries,
> we try to find a prefix using the fields involved in GROUP BY
> and we then check if this newly returned prefix lies within the range we had calculated earlier.
> 
> For queries which use composite key, few parts of which are involved in GROUP BY and few in the MIN/MAX functtions
> in the select list, we try to find the prefix of the ranges by using the fields involved in the group by clause.
> We get extra rows in the output when we have same partial ranges created for the fields in the GROUP BY clause
> 
> This issue can be fixed if we compare such partial ranges and don't lookup if we see the same prefix again.
> 
> ---
>  mysql-test/r/range.result         | 14 +++++++++++++
>  mysql-test/r/range_mrr_icp.result | 14 +++++++++++++
>  mysql-test/t/range.test           | 13 ++++++++++++
>  sql/opt_range.cc                  | 44 ++++++++++++++++++++++++++++++++++++---
>  sql/opt_range.h                   |  5 +++--
>  5 files changed, 85 insertions(+), 5 deletions(-)
> 
> diff --git a/mysql-test/r/range.result b/mysql-test/r/range.result
> index 630a692cef6..72b31de4df4 100644
> --- a/mysql-test/r/range.result
> +++ b/mysql-test/r/range.result
> @@ -2144,3 +2144,17 @@ value1	1000685	12345
>  value1	1003560	12345
>  value1	1004807	12345
>  drop table t1;
> +#
> +# MDEV-6707: Wrong result (extra row) with group by, multi-part key
> +#
> +CREATE TABLE t1 (f1 INT, f2 VARCHAR(1), KEY(f2,f1)) ENGINE=InnoDB;
> +INSERT INTO t1 VALUES
> +(7,'v'),(0,'s'),(9,'l'),(4,'c');
> +explain
> +SELECT MAX(f1), f2 FROM t1 WHERE f2 LIKE 'c%' AND f1 <> 9 GROUP BY f2;
> +id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
> +1	SIMPLE	t1	range	f2	f2	9	NULL	2	Using where; Using index for group-by
> +SELECT MAX(f1), f2 FROM t1 WHERE f2 LIKE 'c%' AND f1 <> 9 GROUP BY f2;
> +MAX(f1)	f2
> +4	c
> +DROP TABLE t1;
> diff --git a/mysql-test/r/range_mrr_icp.result b/mysql-test/r/range_mrr_icp.result
> index 3f5de5b0189..1050bfcd887 100644
> --- a/mysql-test/r/range_mrr_icp.result
> +++ b/mysql-test/r/range_mrr_icp.result
> @@ -2146,4 +2146,18 @@ value1	1000685	12345
>  value1	1003560	12345
>  value1	1004807	12345
>  drop table t1;
> +#
> +# MDEV-6707: Wrong result (extra row) with group by, multi-part key
> +#
> +CREATE TABLE t1 (f1 INT, f2 VARCHAR(1), KEY(f2,f1)) ENGINE=InnoDB;
> +INSERT INTO t1 VALUES
> +(7,'v'),(0,'s'),(9,'l'),(4,'c');
> +explain
> +SELECT MAX(f1), f2 FROM t1 WHERE f2 LIKE 'c%' AND f1 <> 9 GROUP BY f2;
> +id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
> +1	SIMPLE	t1	range	f2	f2	9	NULL	2	Using where; Using index for group-by
> +SELECT MAX(f1), f2 FROM t1 WHERE f2 LIKE 'c%' AND f1 <> 9 GROUP BY f2;
> +MAX(f1)	f2
> +4	c
> +DROP TABLE t1;
>  set optimizer_switch=@mrr_icp_extra_tmp;
> diff --git a/mysql-test/t/range.test b/mysql-test/t/range.test
> index 393ca68e945..62e907b7b4a 100644
> --- a/mysql-test/t/range.test
> +++ b/mysql-test/t/range.test
> @@ -1718,3 +1718,16 @@ where (key1varchar='value1' AND (key2int <=1 OR  key2int > 1));
>  --echo # The following must show col1=12345 for all rows:
>  select * from t1;
>  drop table t1;
> +
> +--echo #
> +--echo # MDEV-6707: Wrong result (extra row) with group by, multi-part key
> +--echo #
> +
> +CREATE TABLE t1 (f1 INT, f2 VARCHAR(1), KEY(f2,f1)) ENGINE=InnoDB;
> +INSERT INTO t1 VALUES
> +(7,'v'),(0,'s'),(9,'l'),(4,'c');
> +
> +explain
> +SELECT MAX(f1), f2 FROM t1 WHERE f2 LIKE 'c%' AND f1 <> 9 GROUP BY f2;
> +SELECT MAX(f1), f2 FROM t1 WHERE f2 LIKE 'c%' AND f1 <> 9 GROUP BY f2;
> +DROP TABLE t1;
> diff --git a/sql/opt_range.cc b/sql/opt_range.cc
> index 25a9e729a8b..be959764b16 100644
> --- a/sql/opt_range.cc
> +++ b/sql/opt_range.cc
> @@ -11249,6 +11249,7 @@ int QUICK_RANGE_SELECT::get_next()
>    @param prefix_length   length of cur_prefix
>    @param group_key_parts The number of key parts in the group prefix
>    @param cur_prefix      prefix of a key to be searched for
> +  @param save_last_range INOUT Saving the last range we encountered
>  
>    Each subsequent call to the method retrieves the first record that has a
>    prefix with length prefix_length and which is different from cur_prefix,
> @@ -11271,7 +11272,8 @@ int QUICK_RANGE_SELECT::get_next()
>  
>  int QUICK_RANGE_SELECT::get_next_prefix(uint prefix_length,
>                                          uint group_key_parts,
> -                                        uchar *cur_prefix)
> +                                        uchar *cur_prefix,
> +                                        QUICK_RANGE **save_last_range)
>  {
>    DBUG_ENTER("QUICK_RANGE_SELECT::get_next_prefix");
>    const key_part_map keypart_map= make_prev_keypart_map(group_key_parts);
> @@ -11301,8 +11303,43 @@ int QUICK_RANGE_SELECT::get_next_prefix(uint prefix_length,
>        last_range= 0;
>        DBUG_RETURN(HA_ERR_END_OF_FILE);
>      }
> +
>      last_range= *(cur_range++);
>  
> +    /*
> +      While calculating these prefixes we might encounter a case where there
> +      woud be same partial ranges for multiple ranges.
Typo.

> +      An example would be
> +      select max(f1), f2 from t1 where f2 ='c' AND f1 <> 9 group by f2;
> +      so the ranges would be
> +            (NULL,c <= f1,f2 <= c,9)
> +            (c,9    <= f1,f2 <= c,+infinity)

This looks as if index was defined on (f1, f2) ? 
Then the comment doesn't make sense because loose index scan does not support
queries in form:

  select max(key_part1) ... group by key_part2 

It only can handle them when the index is on (f2,f1), but this is
counterintuitive for the reader that is not familiar with the testcase for this
bug.

Please rephrase the comment. Please use key_part1 and key_part2 as column
names. Tuple comparisons should look like this (note the braces):

   (a,b) < (c,d)

.

> +
> +      In this case for calculating prefixes with the group by field we take up the
> +      partial ranges involving field f2 those would be
> +             c<= f2 <=c
> +             c<= f2 <=c
> +
> +      So we lookup rows with the same prefix in all such ranges and
> +      then we check for the other part(in this case f1) in ALL the ranges.
> +      So if a record lies in a range, then it would satisfy both the partial
> +      ranges in this case and therefore there would be multiple outputs for
> +      the same row.
> +      For such cases we should calculate the prefix only when we have the next
> +      partial range different from the previous one.
> +    */
> +
> +
> +    if (*save_last_range)
> +    {
> +      if (!key_tuple_cmp(key_part_info, (*save_last_range)->min_key,
> +                        last_range->min_key, prefix_length))
> +      {
> +        last_range=NULL;
> +        continue;
> +      }
> +    }
> +    *save_last_range= last_range;
>      key_range start_key, end_key;
>      last_range->make_min_endpoint(&start_key, prefix_length, keypart_map);
>      last_range->make_max_endpoint(&end_key, prefix_length, keypart_map);
> @@ -13315,7 +13352,7 @@ QUICK_GROUP_MIN_MAX_SELECT(TABLE *table, JOIN *join_arg, bool have_min_arg,
>     seen_first_key(FALSE), doing_key_read(FALSE), min_max_arg_part(min_max_arg_part_arg),
>     key_infix(key_infix_arg), key_infix_len(key_infix_len_arg),
>     min_functions_it(NULL), max_functions_it(NULL),
> -   is_index_scan(is_index_scan_arg)
> +   is_index_scan(is_index_scan_arg), save_last_range(NULL)

This is not the right place for this initialization. It needs to be in
QUICK_GROUP_MIN_MAX_SELECT::reset(), as the quick select may be executed
multiple times.

Testcase:

create table ten(a int);
insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);

MariaDB [test]> select (SELECT MAX(f1) as MAXVAL FROM t1 WHERE f2 LIKE 'c%' AND f1 <> 9 GROUP BY f2) AS SUBQ  from ten;
+------+
| SUBQ |
+------+
|    4 |
|    4 |
|    4 |
|    4 |
|    4 |
|    4 |
|    4 |
|    4 |
|    4 |
|    4 |
+------+
10 rows in set (0.00 sec)

Now, let's add a HAVING clause (which always evaluates to TRUE but forces the
subquery to be recomputed:

MariaDB [test]> select (SELECT MAX(f1) as MAXVAL FROM t1 WHERE f2 LIKE 'c%' AND f1 <> 9 GROUP BY f2 having maxval<100+ten.a) AS SUBQ from ten;
+------+
| SUBQ |
+------+
|    4 |
| NULL |
| NULL |
| NULL |
| NULL |
| NULL |
| NULL |
| NULL |
| NULL |
| NULL |
+------+
10 rows in set (0.00 sec)

To make sure this is loose index, try with IGNORE INDEX:

MariaDB [test]> select (SELECT MAX(f1) as MAXVAL FROM t1 ignore index(f2) WHERE f2 LIKE 'c%' AND f1 <> 9 GROUP BY f2 having maxval<100+ten.a) AS SUBQ from ten;
+------+
| SUBQ |
+------+
|    4 |
|    4 |
|    4 |
|    4 |
|    4 |
|    4 |
|    4 |
|    4 |
|    4 |
|    4 |
+------+
10 rows in set (0.00 sec)

Please fix this and add the above into the testcase.

>  {
>    head=       table;
>    index=      use_index;
> @@ -13968,7 +14005,8 @@ int QUICK_GROUP_MIN_MAX_SELECT::next_prefix()
>      uchar *cur_prefix= seen_first_key ? group_prefix : NULL;
>      if ((result= quick_prefix_select->get_next_prefix(group_prefix_len,
>                                                        group_key_parts, 
> -                                                      cur_prefix)))
> +                                                      cur_prefix,
> +                                                      &save_last_range)))
>        DBUG_RETURN(result);
>      seen_first_key= TRUE;
>    }
> diff --git a/sql/opt_range.h b/sql/opt_range.h
> index b8b46ae5ab1..206d65f6da4 100644
> --- a/sql/opt_range.h
> +++ b/sql/opt_range.h
> @@ -477,7 +477,7 @@ class QUICK_RANGE_SELECT : public QUICK_SELECT_I
>    int get_next();
>    void range_end();
>    int get_next_prefix(uint prefix_length, uint group_key_parts, 
> -                      uchar *cur_prefix);
> +                      uchar *cur_prefix, QUICK_RANGE **save_last_range);
>    bool reverse_sorted() { return 0; }
>    bool unique_key_range();
>    int init_ror_merged_scan(bool reuse_handler, MEM_ROOT *alloc);
> @@ -916,7 +916,8 @@ class QUICK_GROUP_MIN_MAX_SELECT : public QUICK_SELECT_I
>      Use index scan to get the next different key instead of jumping into it 
>      through index read 
>    */
> -  bool is_index_scan; 
> +  bool is_index_scan;
> +  QUICK_RANGE *save_last_range;

Please add a comment describing the new member variable.
>  public:
>    /*
>      The following two members are public to allow easy access from

BR
 Sergei
-- 
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://s.petrunia.net/blog