maria-developers team mailing list archive

Thread
Date
Re: MDEV-26278 Elimination: consider GROUP BY as unique key

To: Oleg Smirnov <oleg.smirnov@xxxxxxxxxxx>
From: Sergey Petrunia <sergey@xxxxxxxxxxx>
Date: Thu, 19 May 2022 18:55:25 +0300
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx
Hi Oleg,

Please find the review below.

> commit 20609d374b3a82c46291ee5e87cfadd26de47817
> Author: Oleg Smirnov <olernov@xxxxxxxxx>
> Date:   Thu Feb 17 22:53:37 2022 +0700
>
>    MDEV-26278 Elimination: consider GROUP BY as unique key

Please provide a more verbose description. We should aim for it to provide
sufficient info for one to understand what the optimization does and how
that is achieved.


Some general comments:

The patch uses this formatting for multi-line comments:

> +  /* GROUP BY expression is considered as a synthetic "unique key"
> +    for the derived table. Add this key as a dependency */

While the coding style and actual MariaDB code would use this:

  /*
    GROUP BY expression is considered as a synthetic "unique key"
    for the derived table. Add this key as a dependency
  */

Please use this style for all comments.


Also, the patch uses the term "synthetic unique key". I think the term
is not very good: It can be confused with "synthetic primary key", also
I read "synthetic" as something that actually exists, while in our case
the key doesn't exist.
Would using "Pseudo-key" be better?


> diff --git a/sql/table.h b/sql/table.h
> index 497502b2d06..fd598556cc3 100644
> --- a/sql/table.h
> +++ b/sql/table.h
> @@ -2872,6 +2872,9 @@ struct TABLE_LIST
>      }
>    }
>  
> +  void mark_derived_as_eliminated() { m_is_derived_eliminated= true; }
> +  bool is_derived_eliminated() const { return m_is_derived_eliminated; }
> +
So, TABLE_LIST objects are created non-eliminated and then they are
eliminated, and that is forever. TABLE_LIST object has the same lifetime
as the statement, that is, it will survive multiple executions of a prepared
statement.

On the other hand, Table Elimination optimization is done in eliminate_tables() 
call which is invoked for every statement execution.
This looks like a dangerous mismatch. I'll continue below

>  private:
>    bool prep_check_option(THD *thd, uint8 check_opt_type);
>    bool prep_where(THD *thd, Item **conds, bool no_where_clause);
> diff --git a/sql/sql_lex.cc b/sql/sql_lex.cc
> index 3bf6d9abf57..02ef5cd7ad7 100644
> --- a/sql/sql_lex.cc
> +++ b/sql/sql_lex.cc
> @@ -11786,7 +11786,7 @@ bool SELECT_LEX_UNIT::explainable() const
>      EXPLAIN/ANALYZE unit, when:
>      (1) if it's a subquery - it's not part of eliminated WHERE/ON clause.
>      (2) if it's a CTE - it's not hanging (needed for execution)
> -    (3) if it's a derived - it's not merged
> +    (3) if it's a derived - it's not merged or eliminated
>      if it's not 1/2/3 - it's some weird internal thing, ignore it
>    */
>    return item ?
> @@ -11795,6 +11795,7 @@ bool SELECT_LEX_UNIT::explainable() const
>               derived && derived->derived_result &&
>                 !with_element->is_hanging_recursive(): // (2)
>               derived ?
> -               derived->is_materialized_derived() :   // (3)
> +               derived->is_materialized_derived() && // (3)
> +                 !derived->is_derived_eliminated() :
>                 false;
>  }
I think, there's no need for m_is_derived_eliminated.
You can have this check in is_derived_eliminated():

  this->derived->table->map & this->outer_select()->join->eliminated_tables


Please try doing this.

> diff --git a/mysql-test/main/table_elim.result b/mysql-test/main/table_elim.result
> index deff0623370..842330f3713 100644
> --- a/mysql-test/main/table_elim.result
> +++ b/mysql-test/main/table_elim.result
> @@ -704,3 +704,252 @@ LIMIT 1;
>  PostID	Voted
>  1	NULL
>  DROP TABLE t1,t2;
> +#
> +# MDEV-26278: Table elimination does not work across derived tables
> +#
> +create table t1 (a int, b int);
> +insert into t1 select seq, seq+10 from seq_1_to_10;
> +create table t11 (
> +a int not null,
> +b int,
> +key(a)
> +);
> +insert into t11 select A.seq, A.seq+B.seq
> +from
> +seq_1_to_100 A,
> +seq_1_to_1000 B;

Do you really need 100K rows table? I think 10K rows will be just as good.


> diff --git a/sql/opt_table_elimination.cc b/sql/opt_table_elimination.cc
> index 8c4720bdec4..fe2155191ca 100644
> --- a/sql/opt_table_elimination.cc
> +++ b/sql/opt_table_elimination.cc
...
> @@ -278,6 +286,8 @@ class Dep_value_field : public Dep_value
>      Dep_module_key *key_dep;
>      /* Otherwise, this and advance */
>      uint equality_no;
> +
> +    Dep_module_synthetic_key *synth_key_dep;

Please add a comment for the above that we should try that one, too.

@@ -447,6 +463,55 @@ const size_t Dep_module::iterator_size=
>    MY_MAX(Dep_module_expr::iterator_size, Dep_module_key::iterator_size);
>  
>  
> +/* A synthetic unique key module for a derived table.
> +  For example, a derived table
> +  SELECT t11.a, count(*) from t1 LEFT JOIN t2 ON t1.id = t2.fk GROUP BY t11.a
> +  has unique values in its first field (t11.a) due to GROUP BY expression
> +  so this can be considered as a unique key for this derived table
> +*/
> +
> +class Dep_module_synthetic_key : public Dep_module
> +{
> +public:
> +  Dep_module_synthetic_key(Dep_value_table *table_arg,
> +                           std::vector<field_index_t>&& field_indexes)
> +      : table(table_arg), derived_table_field_indexes(field_indexes)
> +  {
> +    unbound_args= static_cast<uint>(field_indexes.size());
> +  }
> +
> +  Dep_value_table * table;

Coding style: "*table".

...

> @@ -1603,7 +1674,73 @@ Dep_value_table *Dep_analysis_context::create_table_value(TABLE *table)
>        key_list= &(key_dep->next_table_key);
>      }
>    }
> -  return table_deps[table->tablenr]= tbl_dep;
> +
> +  auto select_unit= table_list->get_unit();
> +  SELECT_LEX *first_select= nullptr;
> +  if (select_unit)

Please move the code that creates the pseudo-key into a separate function.

...

> +
> +int Dep_analysis_context::find_field_in_list(List<Item> &fields_list,
> +                                             Item *field)

Please add a one-line comment describing what function does.

> @@ -1786,6 +1941,18 @@ Dep_module* Dep_value_field::get_next_unbound_module(Dep_analysis_context *dac,
>    }
>    else 
>      di->key_dep= NULL;
> +
> +  Dep_module_synthetic_key *synth_key_dep= di->synth_key_dep;
> +  if (synth_key_dep && !synth_key_dep->is_applicable() &&
> +    std::find(synth_key_dep->derived_table_field_indexes.begin(),
> +      synth_key_dep->derived_table_field_indexes.end(),
> +      field->field_index) != std::end(synth_key_dep->derived_table_field_indexes))

Please introduce a method like 

  bool Dep_module_synthetic_key::covers_field(int field_index);

Will this also allow to make
Dep_module_synthetic_key::derived_table_field_indexes a private member?


BR
 Sergei
-- 
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://petrunia.net
Follow ups

Review #2 for: MDEV-26278 Elimination: consider GROUP BY as unique key
From: Sergey Petrunia, 2022-06-09