maria-developers team mailing list archive

Thread
Date
Re: [Commits] 60d1803ec64: MDEV-6707: Wrong result (extra row) with group by, multi-part key

To: maria-developers@xxxxxxxxxxxxxxxxxxx, Varun <varunraiko1803@xxxxxxxxx>
From: Sergey Petrunia <sergey@xxxxxxxxxxx>
Date: Thu, 18 Jan 2018 16:04:31 +0300
Cc: commits@xxxxxxxxxxx
In-reply-to: <20180110210555.E64C51013FE174@Varuns-MacBook-Pro.local>
User-agent: Mutt/1.5.21 (2010-09-15)
On Thu, Jan 11, 2018 at 02:35:55AM +0530, Varun wrote:
> revision-id: 60d1803ec645b6d388aac7125ac6381ecee4d548 (mariadb-5.5.56-115-g60d1803ec64)
> parent(s): 9c9cf556a15af84f9133ef859b50596085f7ae8e
> author: Varun Gupta
> committer: Varun Gupta
> timestamp: 2018-01-11 01:51:23 +0530
> message:
> 
> MDEV-6707: Wrong result (extra row) with group by, multi-part key
> 
> Range optimizer creates ranges according to the where conditions. So this case involves using a composite key,
> few parts of which are involved in GROUP BY and few in the MIN/MAX functions in the select list.
> So during the execution of such queries, we try to find a prefix using the fields involved in GROUP BY
> and we then check if this newly returned prefix lies within the range we had calculated earlier.
> After we find a row that satisfies the partial range(involving fields of GROUP BY), then we look at the other fields
> of the where clause. We look in all the ranges if we can satisfy the above retured row.
> 
> The issue is while calculating these prefixes we might encounter same partial ranges for multiple ranges.
>      An example would be
>      where condition is f2 ='c' AND f1 <> 9
>      so the ranges would be
>              (NULL,c <= f1,f2 <= c,9)
>              (c,9    <= f1,f2 <= c,+infinity)
> 
>      In this case for calculating prefixes with the group by field we take up the
>      partial ranges involving field f2 those would be
>              c<= f2 <=c
>              c<= f2 <=c
>      So we lookup rows with the same prefix in all such ranges and then we check for the other
>      part(in this case f1) in all the ranges. So lets take a row of the table to be (4,'c'), that would have
>      a prefix 'c' and value for f1 to be 4, so it would lie in the first range mentioned above. But we would
>      get this same record again when we read the partial range and this would again satisfy the first range.
> This issue can be fixed if we compare such partial ranges and don't lookup if we see the same prefix again.

This is a huge and long comment. 
Please make it smaller, and move at least part of it into the code. Good places
would be: 
- The function comment for QUICK_RANGE_SELECT::get_next_prefix
- the new lines in QUICK_RANGE_SELECT::get_next_prefix which use/update the
  value of save_last_range

> 
> ---
>  mysql-test/r/range.result         | 14 ++++++++++++++
>  mysql-test/r/range_mrr_icp.result | 14 ++++++++++++++
>  mysql-test/t/range.test           | 13 +++++++++++++
>  sql/opt_range.cc                  | 20 +++++++++++++++++---
>  sql/opt_range.h                   |  6 ++++--
>  5 files changed, 62 insertions(+), 5 deletions(-)
> 
> diff --git a/mysql-test/r/range.result b/mysql-test/r/range.result
> index 630a692cef6..72b31de4df4 100644
> --- a/mysql-test/r/range.result
> +++ b/mysql-test/r/range.result
> @@ -2144,3 +2144,17 @@ value1	1000685	12345
>  value1	1003560	12345
>  value1	1004807	12345
>  drop table t1;
> +#
> +# MDEV-6707: Wrong result (extra row) with group by, multi-part key
> +#
> +CREATE TABLE t1 (f1 INT, f2 VARCHAR(1), KEY(f2,f1)) ENGINE=InnoDB;
> +INSERT INTO t1 VALUES
> +(7,'v'),(0,'s'),(9,'l'),(4,'c');
> +explain
> +SELECT MAX(f1), f2 FROM t1 WHERE f2 LIKE 'c%' AND f1 <> 9 GROUP BY f2;
> +id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
> +1	SIMPLE	t1	range	f2	f2	9	NULL	2	Using where; Using index for group-by
> +SELECT MAX(f1), f2 FROM t1 WHERE f2 LIKE 'c%' AND f1 <> 9 GROUP BY f2;
> +MAX(f1)	f2
> +4	c
> +DROP TABLE t1;
> diff --git a/mysql-test/r/range_mrr_icp.result b/mysql-test/r/range_mrr_icp.result
> index 3f5de5b0189..1050bfcd887 100644
> --- a/mysql-test/r/range_mrr_icp.result
> +++ b/mysql-test/r/range_mrr_icp.result
> @@ -2146,4 +2146,18 @@ value1	1000685	12345
>  value1	1003560	12345
>  value1	1004807	12345
>  drop table t1;
> +#
> +# MDEV-6707: Wrong result (extra row) with group by, multi-part key
> +#
> +CREATE TABLE t1 (f1 INT, f2 VARCHAR(1), KEY(f2,f1)) ENGINE=InnoDB;
> +INSERT INTO t1 VALUES
> +(7,'v'),(0,'s'),(9,'l'),(4,'c');
> +explain
> +SELECT MAX(f1), f2 FROM t1 WHERE f2 LIKE 'c%' AND f1 <> 9 GROUP BY f2;
> +id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
> +1	SIMPLE	t1	range	f2	f2	9	NULL	2	Using where; Using index for group-by
> +SELECT MAX(f1), f2 FROM t1 WHERE f2 LIKE 'c%' AND f1 <> 9 GROUP BY f2;
> +MAX(f1)	f2
> +4	c
> +DROP TABLE t1;
>  set optimizer_switch=@mrr_icp_extra_tmp;
> diff --git a/mysql-test/t/range.test b/mysql-test/t/range.test
> index 393ca68e945..62e907b7b4a 100644
> --- a/mysql-test/t/range.test
> +++ b/mysql-test/t/range.test
> @@ -1718,3 +1718,16 @@ where (key1varchar='value1' AND (key2int <=1 OR  key2int > 1));
>  --echo # The following must show col1=12345 for all rows:
>  select * from t1;
>  drop table t1;
> +
> +--echo #
> +--echo # MDEV-6707: Wrong result (extra row) with group by, multi-part key
> +--echo #
> +
> +CREATE TABLE t1 (f1 INT, f2 VARCHAR(1), KEY(f2,f1)) ENGINE=InnoDB;
> +INSERT INTO t1 VALUES
> +(7,'v'),(0,'s'),(9,'l'),(4,'c');
> +
> +explain
> +SELECT MAX(f1), f2 FROM t1 WHERE f2 LIKE 'c%' AND f1 <> 9 GROUP BY f2;
> +SELECT MAX(f1), f2 FROM t1 WHERE f2 LIKE 'c%' AND f1 <> 9 GROUP BY f2;
> +DROP TABLE t1;
> diff --git a/sql/opt_range.cc b/sql/opt_range.cc
> index 25a9e729a8b..e985aba1140 100644
> --- a/sql/opt_range.cc
> +++ b/sql/opt_range.cc
> @@ -11249,6 +11249,7 @@ int QUICK_RANGE_SELECT::get_next()
>    @param prefix_length   length of cur_prefix
>    @param group_key_parts The number of key parts in the group prefix
>    @param cur_prefix      prefix of a key to be searched for
> +  @save_last_range       Saving the last range we encountered

You've meant "@param save_last_range" ?
Also please use " INOUT " to indicate this parameter is used to pass info both
from the caller and to the caller.

>  
>    Each subsequent call to the method retrieves the first record that has a
>    prefix with length prefix_length and which is different from cur_prefix,
> @@ -11271,7 +11272,8 @@ int QUICK_RANGE_SELECT::get_next()
>  
>  int QUICK_RANGE_SELECT::get_next_prefix(uint prefix_length,
>                                          uint group_key_parts,
> -                                        uchar *cur_prefix)
> +                                        uchar *cur_prefix,
> +                                        QUICK_RANGE **save_last_range)
>  {
>    DBUG_ENTER("QUICK_RANGE_SELECT::get_next_prefix");
>    const key_part_map keypart_map= make_prev_keypart_map(group_key_parts);
> @@ -11301,8 +11303,19 @@ int QUICK_RANGE_SELECT::get_next_prefix(uint prefix_length,
>        last_range= 0;
>        DBUG_RETURN(HA_ERR_END_OF_FILE);
>      }
> +
>      last_range= *(cur_range++);
>  
> +    if (*save_last_range)
> +    {
> +      if (!key_tuple_cmp(key_part_info, (*save_last_range)->min_key,
> +                        last_range->min_key, prefix_length))
> +      {
> +        last_range=NULL;
> +        continue;
> +      }
> +    }
> +    *save_last_range= last_range;
>      key_range start_key, end_key;
>      last_range->make_min_endpoint(&start_key, prefix_length, keypart_map);
>      last_range->make_max_endpoint(&end_key, prefix_length, keypart_map);
> @@ -13315,7 +13328,7 @@ QUICK_GROUP_MIN_MAX_SELECT(TABLE *table, JOIN *join_arg, bool have_min_arg,
>     seen_first_key(FALSE), doing_key_read(FALSE), min_max_arg_part(min_max_arg_part_arg),
>     key_infix(key_infix_arg), key_infix_len(key_infix_len_arg),
>     min_functions_it(NULL), max_functions_it(NULL),
> -   is_index_scan(is_index_scan_arg)
> +   is_index_scan(is_index_scan_arg), save_last_range(NULL)
>  {
>    head=       table;
>    index=      use_index;
> @@ -13968,7 +13981,8 @@ int QUICK_GROUP_MIN_MAX_SELECT::next_prefix()
>      uchar *cur_prefix= seen_first_key ? group_prefix : NULL;
>      if ((result= quick_prefix_select->get_next_prefix(group_prefix_len,
>                                                        group_key_parts, 
> -                                                      cur_prefix)))
> +                                                      cur_prefix,
> +                                                      &save_last_range)))
>        DBUG_RETURN(result);
>      seen_first_key= TRUE;
>    }
> diff --git a/sql/opt_range.h b/sql/opt_range.h
> index b8b46ae5ab1..f4fb0642590 100644
> --- a/sql/opt_range.h
> +++ b/sql/opt_range.h
> @@ -455,6 +455,7 @@ class QUICK_RANGE_SELECT : public QUICK_SELECT_I
>    QUICK_RANGE **cur_range;  /* current element in ranges  */
>    
>    QUICK_RANGE *last_range;
> +  QUICK_RANGE *save_last_range;

So, now both QUICK_RANGE_SELECT and QUICK_GROUP_MIN_MAX_SELECT have
'save_last_range'. Does QUICK_RANGE_SELECT need one?

>    
>    KEY_PART *key_parts;
>    KEY_PART_INFO *key_part_info;
> @@ -477,7 +478,7 @@ class QUICK_RANGE_SELECT : public QUICK_SELECT_I
>    int get_next();
>    void range_end();
>    int get_next_prefix(uint prefix_length, uint group_key_parts, 
> -                      uchar *cur_prefix);
> +                      uchar *cur_prefix, QUICK_RANGE **save_last_range);
>    bool reverse_sorted() { return 0; }
>    bool unique_key_range();
>    int init_ror_merged_scan(bool reuse_handler, MEM_ROOT *alloc);
> @@ -916,7 +917,8 @@ class QUICK_GROUP_MIN_MAX_SELECT : public QUICK_SELECT_I
>      Use index scan to get the next different key instead of jumping into it 
>      through index read 
>    */
> -  bool is_index_scan; 
> +  bool is_index_scan;
> +  QUICK_RANGE *save_last_range;
>  public:
>    /*
>      The following two members are public to allow easy access from

BR
 Sergei
-- 
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://s.petrunia.net/blog