maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #13372
Re: Review input for: MDEV-22534 Decorrelate IN subquery
Hi Sergei,
Thanks again for your comments.
I have added a comment on the jira ticket including the link to a patch
that addresses your comments. I will take one more look at it tomorrow
morning before sending it to you for review, but I thought maybe it
makes sense to respond now (see below) given the timezone difference.
> Hi Yuchen,
>
> Please find below review input for
>
> > commit ba9cf8efeb0b1e1c132d565adb79c98156783f15
> > Author: Yuchen Pei <yuchen.pei@xxxxxxxxxxx>
> > Date: Fri May 19 20:06:31 2023 +1000
> >
> > MDEV-22534 Decorrelate IN subquery
> >
>
> First, general questions:
>
> Q1: It seems there is some overlap between
> Item_in_subselect::exists2in_processor() and
> Item_exists_subselect::exists2in_processor() Did you consider factoring
> out common code rather than copying it? (if yes, I'd like to know why
> it couldn't be done).
Done. I factored out the common code into three methods:
- Item_exists_subselect::exists2in_prepare()
- Item_exists_subselect::exists2in_create_or_update_in()
- Item_exists_subselect::exists2in_and_is_not_nulls()
In doing so I tried to not alter the logic of the exists2in
transformation, while keeping the decorrelate-in transformation
simple. For things done in exists2in but not in my decorrelate-in
implementation, I add a substype() check.
> Q2: Where is the coe that removes the correlated equalities from the
> subquery's WHERE? (I assume Item_exists_subselect code does it?)
Naively, the best place to remove the equalities would be in
find_inner_outer_equalities(), when it iterates over the conds and one
could simply call the remove() method on the iterator. But we can't just
do that because after the call to find_inner_outer_equalities() we need
to do more checks, as well as replacing the equalities with `IS NOT
NULL`s in case of upper_not to ensure the 3vl value of the subquery
remains the same (see my previous response about three value logic).
> > diff --git a/sql/item_subselect.cc b/sql/item_subselect.cc
> > index 9e6c205ca76..8d893794da3 100644
> > --- a/sql/item_subselect.cc
> > +++ b/sql/item_subselect.cc
> > @@ -3099,6 +3099,132 @@ static bool find_inner_outer_equalities(Item **conds,
> > return TRUE;
> > }
> >
> > +/* Checks whether item tree intersects with the free list */
> > +static bool intersects_free_list(Item *item, THD *thd)
> > +{
> > + for (const Item *to_find= thd->free_list; to_find; to_find= to_find->next)
> > + if (item->walk(&Item::find_item_processor, 1, (void *) to_find))
> > + return true;
> > + return false;
> > +}
> > +
>
> Please try moving this function into a separate file. First name that came to
> my mind: opt_subq_decorrelate.cc Feel free to suggest better names. The idea
> is that we should aim for better structure.
> (This request is only valid if the factoring out suggested above doesn't
> work out).
Replied previously. BTW why should this function be in a separate file?
Is it because it is not a method of any of the Item_subselect classes?
> > +/* De-correlate where conditions in an IN subquery */
>
> This needs a bigger comment describing what the rewrite does.
Done and replied previously
> > +bool Item_in_subselect::exists2in_processor(void *opt_arg)
> > +{
> > + THD *thd= (THD *)opt_arg;
> > + SELECT_LEX *first_select= unit->first_select();
> > + JOIN *join= first_select->join;
> > + bool will_be_correlated;
> > + Dynamic_array<EQ_FIELD_OUTER> eqs(PSI_INSTRUMENT_MEM, 5, 5);
> > + List<Item> outer;
> > + Query_arena *arena= NULL, backup;
> > + int res= FALSE;
> > + DBUG_ENTER("Item_in_subselect::exists2in_processor");
> > +
>
> Please move comments out of the if statement and number the conditions.
>
> For examples, see best_access_path(), large if-statement starting with
> "Don't test table scan if it can't be better."
> or
> "EXISTS-to-IN coversion and ORDER BY ... LIMIT clause"
> in Item_exists_subselect::exists2in_processor().
Done and replied previously
> > + if (!optimizer_flag(thd, OPTIMIZER_SWITCH_EXISTS_TO_IN) ||
> > + /* proceed only if I'm a toplevel IN or a toplevel NOT IN */
> > + !(is_top_level_item() ||
> > + (upper_not && upper_not->is_top_level_item())) ||
>
> what is the reason behind this condition?
> If we're able to handle top-level NOT IN, why can't we handle subquery in any
> context?
>
> Can we really handle the NOT IN? (I suspect not, it will suffer from the
> semantics of IN subqueries with NULLs..
> I can't find the name for this, it's described e.g. here:
> https://petrunia.net/2006/11/16/inany-subqueries-null-woes/ )
Done and replied previously
>
> > + first_select->is_part_of_union() || /* skip if part of a union */
> > + first_select->group_list.elements || /* skip if with group by */
> > + first_select->with_sum_func || /* skip if aggregation */
> > + join->having || /* skip if with having */
> > + !join->conds || /* skip if no conds */
> > + /* skip if left expr is a single nonscalar subselect */
> > + (left_expr->type() == Item::SUBSELECT_ITEM &&
> > + !left_expr->type_handler()->is_scalar_type()))
>
> The above seems to be some exotic SQL. Is it something like
>
> (select col1,lol2 from t1) IN (select col3,col4 from t2 where ..)
>
> and does MariaDB really support this? Please add a comment about
> this.
Done and replied previously
>
> > + DBUG_RETURN(FALSE);
> > + /* iterate over conditions, and check whether they can be moved out. */
> > + if (find_inner_outer_equalities(&join->conds, eqs))
> > + DBUG_RETURN(FALSE);
> > + /* If we are in the execution of a prepared statement, check for
> > + intersection with the free list to avoid segfault. Note that the
> > + check for prepared statement execution is necessary, otherwise it
> > + will likely always find intersection thus abort the
> > + transformation.
> > +
> > + fixme: this only works when HAVE_PSI_STATEMENT_INTERFACE is
> > + defined. It causes CI's like amd64-debian-10-debug-embedded to
> > + fail. Are there other ways to find out we are in the execution of a
> > + prepared statement? */
>
> I'm looking at the code of activate_stmt_arena_if_needed(). It seems
> "stmt_arena->is_conventional()" is the check you're looking for.
Done and replied previously
>
> > + if (thd->m_statement_state.m_parent_prepared_stmt)
> > + {
> > + for (uint i= 0; i < (uint) eqs.elements(); i++)
> > + {
> > + if (intersects_free_list(*eqs.at(i).eq_ref, thd))
> > + DBUG_RETURN(FALSE);
>
> This check doesn't look good.
> As far as I understand, it is a temporary measure until related
> re-execution bug(s) are fixed by Igor (and/or Sanja)?
Replied previously
> > + }
> > + }
>
> > + /* Determines whether the result will be correlated */
> > + {
> > + List<Item> unused;
> > + Collect_deps_prm prm= {&unused, // parameters
> > + unit->first_select()->nest_level_base, // nest_level_base
> > + 0, // count
> > + unit->first_select()->nest_level, // nest_level
> > + FALSE // collect
> > + };
> > + walk(&Item::collect_outer_ref_processor, TRUE, &prm);
> > + DBUG_ASSERT(prm.count > 0);
> > + DBUG_ASSERT(prm.count >= (uint)eqs.elements());
> > + will_be_correlated= prm.count > (uint)eqs.elements();
> See the example pasted in the MDEV:
> https://jira.mariadb.org/browse/MDEV-22534?focusedCommentId=259705&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-259705
>
> Somehow, I get here will_be_correlated==FALSE but Materialization is still not
> considered for the subquery?
Done and replied in the ticket
>
> > + }
> > + /* Back up the free list */
> > + arena= thd->activate_stmt_arena_if_needed(&backup);
> > + /* Construct the items for left_expr */
> > + if (left_expr->type() == Item::ROW_ITEM)
> > + for (uint i= 0; i < left_expr->cols(); i++)
> > + outer.push_back(left_expr->element_index(i));
> > + else
> > + outer.push_back(left_expr);
> > + const uint offset= first_select->item_list.elements;
> > + /* Move items to outer and select item list */
> > + for (uint i= 0; i < (uint)eqs.elements(); i++)
> > + {
> > + Item **eq_ref= eqs.at(i).eq_ref;
> > + Item_ident *local_field= eqs.at(i).local_field;
> > + Item *outer_exp= eqs.at(i).outer_exp;
> > + first_select->item_list.push_back(local_field, thd->mem_root);
> > + first_select->ref_pointer_array[offset + i]= (Item *)local_field;
>
> How do we know that ref_pointer_array has enough room for the
> new elements?
> (We can discuss this with Sanja, he's an expert on the topic).
>
> I see this check in Item_exists_subselect::exists2in_processor
>
> if ((uint)eqs.elements() > (first_select->item_list.elements +
> first_select->select_n_reserved))
>
> Is it how Item_exists_subselect handles this issue?
Yes, I think so, done
Best,
Yuchen
References