← Back to team overview

maria-developers team mailing list archive

Re: Fwd: [Commits] Rev 3335: Fix bug lp:825075 in file:///home/tsk/mprog/src/5.3/

 

Hi Timour,

Feedback follows skype discussion:

First, here's a general comment

Consider a query:

   SELECT b, min(a) FROM t1 GROUP BY b;

Suppose there is an index (b,a).

Loose scan would execute it as follows:

S00   h->index_first(); 
S01   produce an output record of (b, a)  // first value of 'a' is the minimum
S02  
S03   h->read_next_different_b(); // Jump to the next GROUP-BY group.
S04    ...
S05

If we modify the query by adding a WHERE:

  SELECT b, min(a) FROM t1 WHERE arbitrary_cond(a,b) GROUP BY b;
 
then LooseScan (in its current form) will be unable to execute it. The problem
is here:

S00   h->index_first(); 
S01   produce an output record of (b, a)  // first value of 'a' is the minimum
      ^^^ 
     what if this record doesn't match the WHERE? We'll have to scan further
     within the record having b=first_record.b  until we find a match.

     LooseScan executor doesn't support such "walk forward" action currently.


If the WHERE clause has been converted into a range, such that

    there are no records that would be contained in the range but not match the
    WHERE clause  (***)

then LooseScan is able to handle it. Example:


  SELECT b, min(a) FROM t1 WHERE a>10 GROUP BY b;

Execution would proceed as follows:

S00   h->index_first(); 
S00   h->index_read(HA_READ_AFTER_KEY, {current_row.b, 'a>10'})
S01   produce an output record of (b, a)  // first value of 'a>10' is the minimum

Step S01 will produce correct results as long as the following holds:

  "The first record we've got in a range will satisfy the WHERE condition"

This is not always the case. Here's a counterexample:

SELECT b, min(a) FROM t1 WHERE a>10 GROUP BY b;
SELECT b, min(a) FROM t1 WHERE a>10 and ((a MOD 2) = 0) GROUP BY b; --  (C1)

Consider the second query and this dataset:

 b ,  a
 1,  5
 1,  7
 1,  11
 1,  13
 1,  14

range optimizer would give a range of  "a > 10", and the first matching record
is (1,11).  It doesn't satisfy the "(a MOD 2)=0" condition, neither does the
next record of (1,13),  and only the record (1,14) will satisfy the MOD part of
the condition.

The problem: current code doesn't seem to have the logic to detect such cases. 

The submitted patch disables LooseScan for apparently invalid cases where
the WHERE condition covers a column while range condition does not, which 
means that condition (***) is certainly unmet.

However, cases like (C1) are still not handled.


Further comments are inline.

On Wed, Dec 07, 2011 at 02:18:02PM +0200, Timour Katchaounov wrote:
> Could you please review the below patch.
>
> Timour
>
> -----------------------------------------------------------
> revno: 3335
> revision-id: timour@xxxxxxxxxxxx-20111207115332-3oa536gbt20r22po
> parent: igor@xxxxxxxxxxxx-20111206214218-lev0vyidyt8h3431
> fixes bug(s): https://launchpad.net/bugs/825075
> committer: timour@xxxxxxxxxxxx
> branch nick: 5.3
> timestamp: Wed 2011-12-07 13:53:32 +0200
> message:
>   Fix bug lp:825075
>
>   Analys:
>   The cause for the wrong result was that the optimizer
>   incorrectly chose min/max loose scan when it is not
>   applicable. The applicability test missed the case when
>   a condition on the MIN/MAX argument was OR-ed with a
>   condition on some other field. In this case, the MIN/MAX
>   condition cannot be used for loose scan.
>
>   Solution:
>   The function check_group_min_max_predicates() duplicated
>   functionality performed by the range optimizer, but this
>   duplication was incomplete. The solution is to reuse instead
>   the analys and the ranges produced by the range optimizer.
>   If a suitable range could not be extracted for the MIN/MAX
>   argument, then loose scan is not applicable.

> === modified file 'mysql-test/r/group_min_max.result'
> --- a/mysql-test/r/group_min_max.result	2011-12-04 21:31:42 +0000
> +++ b/mysql-test/r/group_min_max.result	2011-12-07 11:53:32 +0000
> @@ -2103,7 +2103,7 @@ id	select_type	table	type	possible_keys
>  explain select a1,a2,b,min(c),max(c) from t2
>  where (c > 'a000') and (c <= 'd999') and (c like '_8__') group by a1,a2,b;
>  id      select_type     table   type    possible_keys   key     key_len ref     rows    Extra
> -1       SIMPLE  t2      index   NULL    idx_t2_1        163     NULL    164     Using where; Using index
> +1       SIMPLE  t2      range   NULL    idx_t2_1        163     NULL    33      Using where; Using index for group-by
>  explain select a1, a2, b, c, min(d), max(d) from t1 group by a1,a2,b,c;
>  id      select_type     table   type    possible_keys   key     key_len ref     rows    Extra
>  1       SIMPLE  t1      ALL     NULL    NULL    NULL    NULL    128     Using temporary; Using filesort

Why there is no testcase from the bug report? I think it's better to have a
testcase that
- will explicitly mention BUG#
- will show the wrong query result.

A changed EXPLAIN in some irrelevant test is a too poor coverage.

> === modified file 'sql/opt_range.cc'
> --- a/sql/opt_range.cc	2011-11-18 17:35:51 +0000
> +++ b/sql/opt_range.cc	2011-12-07 11:53:32 +0000
> @@ -11984,164 +11996,41 @@ get_best_group_min_max(PARAM *param, SEL
>  }
>  
>  
> -/*
> -  Check that the MIN/MAX attribute participates only in range predicates
> -  with constants.
> -
> -  SYNOPSIS
> -    check_group_min_max_predicates()
> -    cond              tree (or subtree) describing all or part of the WHERE
> -                      clause being analyzed
> -    min_max_arg_item  the field referenced by the MIN/MAX function(s)
> -    min_max_arg_part  the keypart of the MIN/MAX argument if any
> -
> -  DESCRIPTION
> -    The function walks recursively over the cond tree representing a WHERE
> -    clause, and checks condition (SA3) - if a field is referenced by a MIN/MAX
> -    aggregate function, it is referenced only by one of the following
> -    predicates: {=, !=, <, <=, >, >=, between, is null, is not null}.
> +/**
> +  Find the range tree for the field that is an argument of MIN/MAX.
>  
> -  RETURN
> -    TRUE  if cond passes the test
> -    FALSE o/w
> +  @param cond  The WHERE clause of the query 
> +  @param min_max_arg_item  The field referenced by the MIN/MAX function(s)
> +  @param index_tree  Range tree for the chosen index
> +
> +  @note
> +  The function iterates over the range trees for each keypart of the
> +  chosen index until it finds the range for the MIN/MAX keypart.
> +
> +  @retval
> +    If there is a range for the MIN/MAX argument
> +  @retval
> +    NULL if there is no range
>  */
>  
> -static bool
> -check_group_min_max_predicates(Item *cond, Item_field *min_max_arg_item,
> -                               Field::imagetype image_type)
> +static SEL_ARG *
> +get_min_max_range(Item *cond, Item_field *min_max_arg_item, SEL_ARG *index_tree)
>  {
> -  DBUG_ENTER("check_group_min_max_predicates");
> -  DBUG_ASSERT(cond && min_max_arg_item);
> -
> +  if (!cond || !min_max_arg_item)
> +    return NULL;
>    cond= cond->real_item();
So we've got:

get_min_max_range(Item *cond, Item_field *min_max_arg_item, SEL_ARG *index_tree)
{
  if (!cond || !min_max_arg_item)
    return NULL;
  cond= cond->real_item();

'cond' is passed as parameter, checked, modified, and .. then never used in the
function. Please fix this.

> -  Item::Type cond_type= cond->real_type();
> -  if (cond_type == Item::COND_ITEM) /* 'AND' or 'OR' */
> -  {
> -    DBUG_PRINT("info", ("Analyzing: %s", ((Item_func*) cond)->func_name()));
> -    List_iterator_fast<Item> li(*((Item_cond*) cond)->argument_list());
> -    Item *and_or_arg;
> -    while ((and_or_arg= li++))
> -    {
> -      if (!check_group_min_max_predicates(and_or_arg, min_max_arg_item,
> -                                         image_type))
> -        DBUG_RETURN(FALSE);
> -    }
> -    DBUG_RETURN(TRUE);
> -  }
> -
> -  /*
> -    Disallow loose index scan if the MIN/MAX argument field is referenced by
> -    a subquery in the WHERE clause.
> -  */
> -
> -  if (cond_type == Item::SUBSELECT_ITEM)
> -  {
> -    Item_subselect *subs_cond= (Item_subselect*) cond;
> -    if (subs_cond->is_correlated)
> -    {
> -      DBUG_ASSERT(subs_cond->upper_refs.elements > 0);
> -      List_iterator_fast<Item_subselect::Ref_to_outside>
> -        li(subs_cond->upper_refs);
> -      Item_subselect::Ref_to_outside *dep;
> -      while ((dep= li++))
> -      {
> -        if (dep->item->eq(min_max_arg_item, FALSE))
> -          DBUG_RETURN(FALSE);
> -      }
> -    }
> -    DBUG_RETURN(TRUE);
> -  }
> -
>    /*
> -    Condition of the form 'field' is equivalent to 'field <> 0' and thus
> -    satisfies the SA3 condition.
> +    Extract the SEL_ARG subtree that contains only ranges for the MIN/MAX
> +    attribute.
>    */
> -  if (cond_type == Item::FIELD_ITEM)
> +  SEL_ARG *min_max_range= index_tree;
> +  while (min_max_range) /* Find the tree for the MIN/MAX key part. */
>    {
> -    DBUG_PRINT("info", ("Analyzing: %s", cond->full_name()));
> -    DBUG_RETURN(TRUE);
> -  }
> -
> -  /* We presume that at this point there are no other Items than functions. */
> -  DBUG_ASSERT(cond_type == Item::FUNC_ITEM);
> -
> -  /* Test if cond references only group-by or non-group fields. */
> -  Item_func *pred= (Item_func*) cond;
> -  Item **arguments= pred->arguments();
> -  Item *cur_arg;
> -  DBUG_PRINT("info", ("Analyzing: %s", pred->func_name()));
> -  for (uint arg_idx= 0; arg_idx < pred->argument_count (); arg_idx++)
> -  {
> -    cur_arg= arguments[arg_idx]->real_item();
> -    DBUG_PRINT("info", ("cur_arg: %s", cur_arg->full_name()));
> -    if (cur_arg->type() == Item::FIELD_ITEM)
> -    {
> -      if (min_max_arg_item->eq(cur_arg, 1)) 
> -      {
> -       /*
> -         If pred references the MIN/MAX argument, check whether pred is a range
> -         condition that compares the MIN/MAX argument with a constant.
> -       */
> -        Item_func::Functype pred_type= pred->functype();
> -        if (pred_type != Item_func::EQUAL_FUNC     &&
> -            pred_type != Item_func::LT_FUNC        &&
> -            pred_type != Item_func::LE_FUNC        &&
> -            pred_type != Item_func::GT_FUNC        &&
> -            pred_type != Item_func::GE_FUNC        &&
> -            pred_type != Item_func::BETWEEN        &&
> -            pred_type != Item_func::ISNULL_FUNC    &&
> -            pred_type != Item_func::ISNOTNULL_FUNC &&
> -            pred_type != Item_func::EQ_FUNC        &&
> -            pred_type != Item_func::NE_FUNC)
> -          DBUG_RETURN(FALSE);
> -
> -        /* Check that pred compares min_max_arg_item with a constant. */
> -        Item *args[3];
> -        bzero(args, 3 * sizeof(Item*));
> -        bool inv;
> -        /* Test if this is a comparison of a field and a constant. */
> -        if (!simple_pred(pred, args, &inv))
> -          DBUG_RETURN(FALSE);
> -
> -        /* Check for compatible string comparisons - similar to get_mm_leaf. */
> -        if (args[0] && args[1] && !args[2] && // this is a binary function
> -            min_max_arg_item->result_type() == STRING_RESULT &&
> -            /*
> -              Don't use an index when comparing strings of different collations.
> -            */
> -            ((args[1]->result_type() == STRING_RESULT &&
> -              image_type == Field::itRAW &&
> -              ((Field_str*) min_max_arg_item->field)->charset() !=
> -              pred->compare_collation())
> -             ||
> -             /*
> -               We can't always use indexes when comparing a string index to a
> -               number.
> -             */
> -             (args[1]->result_type() != STRING_RESULT &&
> -              min_max_arg_item->field->cmp_type() != args[1]->result_type())))
> -          DBUG_RETURN(FALSE);
> -      }
> -    }
> -    else if (cur_arg->type() == Item::FUNC_ITEM)
> -    {
> -      if (!check_group_min_max_predicates(cur_arg, min_max_arg_item,
> -                                         image_type))
> -        DBUG_RETURN(FALSE);
> -    }
> -    else if (cur_arg->const_item())
> -    {
> -      /*
> -        For predicates of the form "const OP expr" we also have to check 'expr'
> -        to make a decision.
> -      */
> -      continue;
> -    }
> -    else
> -      DBUG_RETURN(FALSE);
> +    if (min_max_range->field->eq(min_max_arg_item->field))
> +      break;
> +    min_max_range= min_max_range->next_key_part;
>    }
> -
> -  DBUG_RETURN(TRUE);
> +  return min_max_range;
>  }
>  
>  


BR
 Sergei
-- 
Sergei Petrunia, Software Developer
Monty Program AB, http://askmonty.org
Blog: http://s.petrunia.net/blog


Follow ups