maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #04551
Re: Fwd: [Commits] Rev 3335: Fix bug lp:825075 in file:///home/tsk/mprog/src/5.3/
Hi Timour,
Feedback follows skype discussion:
First, here's a general comment
Consider a query:
SELECT b, min(a) FROM t1 GROUP BY b;
Suppose there is an index (b,a).
Loose scan would execute it as follows:
S00 h->index_first();
S01 produce an output record of (b, a) // first value of 'a' is the minimum
S02
S03 h->read_next_different_b(); // Jump to the next GROUP-BY group.
S04 ...
S05
If we modify the query by adding a WHERE:
SELECT b, min(a) FROM t1 WHERE arbitrary_cond(a,b) GROUP BY b;
then LooseScan (in its current form) will be unable to execute it. The problem
is here:
S00 h->index_first();
S01 produce an output record of (b, a) // first value of 'a' is the minimum
^^^
what if this record doesn't match the WHERE? We'll have to scan further
within the record having b=first_record.b until we find a match.
LooseScan executor doesn't support such "walk forward" action currently.
If the WHERE clause has been converted into a range, such that
there are no records that would be contained in the range but not match the
WHERE clause (***)
then LooseScan is able to handle it. Example:
SELECT b, min(a) FROM t1 WHERE a>10 GROUP BY b;
Execution would proceed as follows:
S00 h->index_first();
S00 h->index_read(HA_READ_AFTER_KEY, {current_row.b, 'a>10'})
S01 produce an output record of (b, a) // first value of 'a>10' is the minimum
Step S01 will produce correct results as long as the following holds:
"The first record we've got in a range will satisfy the WHERE condition"
This is not always the case. Here's a counterexample:
SELECT b, min(a) FROM t1 WHERE a>10 GROUP BY b;
SELECT b, min(a) FROM t1 WHERE a>10 and ((a MOD 2) = 0) GROUP BY b; -- (C1)
Consider the second query and this dataset:
b , a
1, 5
1, 7
1, 11
1, 13
1, 14
range optimizer would give a range of "a > 10", and the first matching record
is (1,11). It doesn't satisfy the "(a MOD 2)=0" condition, neither does the
next record of (1,13), and only the record (1,14) will satisfy the MOD part of
the condition.
The problem: current code doesn't seem to have the logic to detect such cases.
The submitted patch disables LooseScan for apparently invalid cases where
the WHERE condition covers a column while range condition does not, which
means that condition (***) is certainly unmet.
However, cases like (C1) are still not handled.
Further comments are inline.
On Wed, Dec 07, 2011 at 02:18:02PM +0200, Timour Katchaounov wrote:
> Could you please review the below patch.
>
> Timour
>
> -----------------------------------------------------------
> revno: 3335
> revision-id: timour@xxxxxxxxxxxx-20111207115332-3oa536gbt20r22po
> parent: igor@xxxxxxxxxxxx-20111206214218-lev0vyidyt8h3431
> fixes bug(s): https://launchpad.net/bugs/825075
> committer: timour@xxxxxxxxxxxx
> branch nick: 5.3
> timestamp: Wed 2011-12-07 13:53:32 +0200
> message:
> Fix bug lp:825075
>
> Analys:
> The cause for the wrong result was that the optimizer
> incorrectly chose min/max loose scan when it is not
> applicable. The applicability test missed the case when
> a condition on the MIN/MAX argument was OR-ed with a
> condition on some other field. In this case, the MIN/MAX
> condition cannot be used for loose scan.
>
> Solution:
> The function check_group_min_max_predicates() duplicated
> functionality performed by the range optimizer, but this
> duplication was incomplete. The solution is to reuse instead
> the analys and the ranges produced by the range optimizer.
> If a suitable range could not be extracted for the MIN/MAX
> argument, then loose scan is not applicable.
> === modified file 'mysql-test/r/group_min_max.result'
> --- a/mysql-test/r/group_min_max.result 2011-12-04 21:31:42 +0000
> +++ b/mysql-test/r/group_min_max.result 2011-12-07 11:53:32 +0000
> @@ -2103,7 +2103,7 @@ id select_type table type possible_keys
> explain select a1,a2,b,min(c),max(c) from t2
> where (c > 'a000') and (c <= 'd999') and (c like '_8__') group by a1,a2,b;
> id select_type table type possible_keys key key_len ref rows Extra
> -1 SIMPLE t2 index NULL idx_t2_1 163 NULL 164 Using where; Using index
> +1 SIMPLE t2 range NULL idx_t2_1 163 NULL 33 Using where; Using index for group-by
> explain select a1, a2, b, c, min(d), max(d) from t1 group by a1,a2,b,c;
> id select_type table type possible_keys key key_len ref rows Extra
> 1 SIMPLE t1 ALL NULL NULL NULL NULL 128 Using temporary; Using filesort
Why there is no testcase from the bug report? I think it's better to have a
testcase that
- will explicitly mention BUG#
- will show the wrong query result.
A changed EXPLAIN in some irrelevant test is a too poor coverage.
> === modified file 'sql/opt_range.cc'
> --- a/sql/opt_range.cc 2011-11-18 17:35:51 +0000
> +++ b/sql/opt_range.cc 2011-12-07 11:53:32 +0000
> @@ -11984,164 +11996,41 @@ get_best_group_min_max(PARAM *param, SEL
> }
>
>
> -/*
> - Check that the MIN/MAX attribute participates only in range predicates
> - with constants.
> -
> - SYNOPSIS
> - check_group_min_max_predicates()
> - cond tree (or subtree) describing all or part of the WHERE
> - clause being analyzed
> - min_max_arg_item the field referenced by the MIN/MAX function(s)
> - min_max_arg_part the keypart of the MIN/MAX argument if any
> -
> - DESCRIPTION
> - The function walks recursively over the cond tree representing a WHERE
> - clause, and checks condition (SA3) - if a field is referenced by a MIN/MAX
> - aggregate function, it is referenced only by one of the following
> - predicates: {=, !=, <, <=, >, >=, between, is null, is not null}.
> +/**
> + Find the range tree for the field that is an argument of MIN/MAX.
>
> - RETURN
> - TRUE if cond passes the test
> - FALSE o/w
> + @param cond The WHERE clause of the query
> + @param min_max_arg_item The field referenced by the MIN/MAX function(s)
> + @param index_tree Range tree for the chosen index
> +
> + @note
> + The function iterates over the range trees for each keypart of the
> + chosen index until it finds the range for the MIN/MAX keypart.
> +
> + @retval
> + If there is a range for the MIN/MAX argument
> + @retval
> + NULL if there is no range
> */
>
> -static bool
> -check_group_min_max_predicates(Item *cond, Item_field *min_max_arg_item,
> - Field::imagetype image_type)
> +static SEL_ARG *
> +get_min_max_range(Item *cond, Item_field *min_max_arg_item, SEL_ARG *index_tree)
> {
> - DBUG_ENTER("check_group_min_max_predicates");
> - DBUG_ASSERT(cond && min_max_arg_item);
> -
> + if (!cond || !min_max_arg_item)
> + return NULL;
> cond= cond->real_item();
So we've got:
get_min_max_range(Item *cond, Item_field *min_max_arg_item, SEL_ARG *index_tree)
{
if (!cond || !min_max_arg_item)
return NULL;
cond= cond->real_item();
'cond' is passed as parameter, checked, modified, and .. then never used in the
function. Please fix this.
> - Item::Type cond_type= cond->real_type();
> - if (cond_type == Item::COND_ITEM) /* 'AND' or 'OR' */
> - {
> - DBUG_PRINT("info", ("Analyzing: %s", ((Item_func*) cond)->func_name()));
> - List_iterator_fast<Item> li(*((Item_cond*) cond)->argument_list());
> - Item *and_or_arg;
> - while ((and_or_arg= li++))
> - {
> - if (!check_group_min_max_predicates(and_or_arg, min_max_arg_item,
> - image_type))
> - DBUG_RETURN(FALSE);
> - }
> - DBUG_RETURN(TRUE);
> - }
> -
> - /*
> - Disallow loose index scan if the MIN/MAX argument field is referenced by
> - a subquery in the WHERE clause.
> - */
> -
> - if (cond_type == Item::SUBSELECT_ITEM)
> - {
> - Item_subselect *subs_cond= (Item_subselect*) cond;
> - if (subs_cond->is_correlated)
> - {
> - DBUG_ASSERT(subs_cond->upper_refs.elements > 0);
> - List_iterator_fast<Item_subselect::Ref_to_outside>
> - li(subs_cond->upper_refs);
> - Item_subselect::Ref_to_outside *dep;
> - while ((dep= li++))
> - {
> - if (dep->item->eq(min_max_arg_item, FALSE))
> - DBUG_RETURN(FALSE);
> - }
> - }
> - DBUG_RETURN(TRUE);
> - }
> -
> /*
> - Condition of the form 'field' is equivalent to 'field <> 0' and thus
> - satisfies the SA3 condition.
> + Extract the SEL_ARG subtree that contains only ranges for the MIN/MAX
> + attribute.
> */
> - if (cond_type == Item::FIELD_ITEM)
> + SEL_ARG *min_max_range= index_tree;
> + while (min_max_range) /* Find the tree for the MIN/MAX key part. */
> {
> - DBUG_PRINT("info", ("Analyzing: %s", cond->full_name()));
> - DBUG_RETURN(TRUE);
> - }
> -
> - /* We presume that at this point there are no other Items than functions. */
> - DBUG_ASSERT(cond_type == Item::FUNC_ITEM);
> -
> - /* Test if cond references only group-by or non-group fields. */
> - Item_func *pred= (Item_func*) cond;
> - Item **arguments= pred->arguments();
> - Item *cur_arg;
> - DBUG_PRINT("info", ("Analyzing: %s", pred->func_name()));
> - for (uint arg_idx= 0; arg_idx < pred->argument_count (); arg_idx++)
> - {
> - cur_arg= arguments[arg_idx]->real_item();
> - DBUG_PRINT("info", ("cur_arg: %s", cur_arg->full_name()));
> - if (cur_arg->type() == Item::FIELD_ITEM)
> - {
> - if (min_max_arg_item->eq(cur_arg, 1))
> - {
> - /*
> - If pred references the MIN/MAX argument, check whether pred is a range
> - condition that compares the MIN/MAX argument with a constant.
> - */
> - Item_func::Functype pred_type= pred->functype();
> - if (pred_type != Item_func::EQUAL_FUNC &&
> - pred_type != Item_func::LT_FUNC &&
> - pred_type != Item_func::LE_FUNC &&
> - pred_type != Item_func::GT_FUNC &&
> - pred_type != Item_func::GE_FUNC &&
> - pred_type != Item_func::BETWEEN &&
> - pred_type != Item_func::ISNULL_FUNC &&
> - pred_type != Item_func::ISNOTNULL_FUNC &&
> - pred_type != Item_func::EQ_FUNC &&
> - pred_type != Item_func::NE_FUNC)
> - DBUG_RETURN(FALSE);
> -
> - /* Check that pred compares min_max_arg_item with a constant. */
> - Item *args[3];
> - bzero(args, 3 * sizeof(Item*));
> - bool inv;
> - /* Test if this is a comparison of a field and a constant. */
> - if (!simple_pred(pred, args, &inv))
> - DBUG_RETURN(FALSE);
> -
> - /* Check for compatible string comparisons - similar to get_mm_leaf. */
> - if (args[0] && args[1] && !args[2] && // this is a binary function
> - min_max_arg_item->result_type() == STRING_RESULT &&
> - /*
> - Don't use an index when comparing strings of different collations.
> - */
> - ((args[1]->result_type() == STRING_RESULT &&
> - image_type == Field::itRAW &&
> - ((Field_str*) min_max_arg_item->field)->charset() !=
> - pred->compare_collation())
> - ||
> - /*
> - We can't always use indexes when comparing a string index to a
> - number.
> - */
> - (args[1]->result_type() != STRING_RESULT &&
> - min_max_arg_item->field->cmp_type() != args[1]->result_type())))
> - DBUG_RETURN(FALSE);
> - }
> - }
> - else if (cur_arg->type() == Item::FUNC_ITEM)
> - {
> - if (!check_group_min_max_predicates(cur_arg, min_max_arg_item,
> - image_type))
> - DBUG_RETURN(FALSE);
> - }
> - else if (cur_arg->const_item())
> - {
> - /*
> - For predicates of the form "const OP expr" we also have to check 'expr'
> - to make a decision.
> - */
> - continue;
> - }
> - else
> - DBUG_RETURN(FALSE);
> + if (min_max_range->field->eq(min_max_arg_item->field))
> + break;
> + min_max_range= min_max_range->next_key_part;
> }
> -
> - DBUG_RETURN(TRUE);
> + return min_max_range;
> }
>
>
BR
Sergei
--
Sergei Petrunia, Software Developer
Monty Program AB, http://askmonty.org
Blog: http://s.petrunia.net/blog
Follow ups