maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #13155
Re: MDEV-26278 Elimination: consider GROUP BY as unique key
Hi Oleg,
Please find the review below.
> commit 20609d374b3a82c46291ee5e87cfadd26de47817
> Author: Oleg Smirnov <olernov@xxxxxxxxx>
> Date: Thu Feb 17 22:53:37 2022 +0700
>
> MDEV-26278 Elimination: consider GROUP BY as unique key
Please provide a more verbose description. We should aim for it to provide
sufficient info for one to understand what the optimization does and how
that is achieved.
Some general comments:
The patch uses this formatting for multi-line comments:
> + /* GROUP BY expression is considered as a synthetic "unique key"
> + for the derived table. Add this key as a dependency */
While the coding style and actual MariaDB code would use this:
/*
GROUP BY expression is considered as a synthetic "unique key"
for the derived table. Add this key as a dependency
*/
Please use this style for all comments.
Also, the patch uses the term "synthetic unique key". I think the term
is not very good: It can be confused with "synthetic primary key", also
I read "synthetic" as something that actually exists, while in our case
the key doesn't exist.
Would using "Pseudo-key" be better?
> diff --git a/sql/table.h b/sql/table.h
> index 497502b2d06..fd598556cc3 100644
> --- a/sql/table.h
> +++ b/sql/table.h
> @@ -2872,6 +2872,9 @@ struct TABLE_LIST
> }
> }
>
> + void mark_derived_as_eliminated() { m_is_derived_eliminated= true; }
> + bool is_derived_eliminated() const { return m_is_derived_eliminated; }
> +
So, TABLE_LIST objects are created non-eliminated and then they are
eliminated, and that is forever. TABLE_LIST object has the same lifetime
as the statement, that is, it will survive multiple executions of a prepared
statement.
On the other hand, Table Elimination optimization is done in eliminate_tables()
call which is invoked for every statement execution.
This looks like a dangerous mismatch. I'll continue below
> private:
> bool prep_check_option(THD *thd, uint8 check_opt_type);
> bool prep_where(THD *thd, Item **conds, bool no_where_clause);
> diff --git a/sql/sql_lex.cc b/sql/sql_lex.cc
> index 3bf6d9abf57..02ef5cd7ad7 100644
> --- a/sql/sql_lex.cc
> +++ b/sql/sql_lex.cc
> @@ -11786,7 +11786,7 @@ bool SELECT_LEX_UNIT::explainable() const
> EXPLAIN/ANALYZE unit, when:
> (1) if it's a subquery - it's not part of eliminated WHERE/ON clause.
> (2) if it's a CTE - it's not hanging (needed for execution)
> - (3) if it's a derived - it's not merged
> + (3) if it's a derived - it's not merged or eliminated
> if it's not 1/2/3 - it's some weird internal thing, ignore it
> */
> return item ?
> @@ -11795,6 +11795,7 @@ bool SELECT_LEX_UNIT::explainable() const
> derived && derived->derived_result &&
> !with_element->is_hanging_recursive(): // (2)
> derived ?
> - derived->is_materialized_derived() : // (3)
> + derived->is_materialized_derived() && // (3)
> + !derived->is_derived_eliminated() :
> false;
> }
I think, there's no need for m_is_derived_eliminated.
You can have this check in is_derived_eliminated():
this->derived->table->map & this->outer_select()->join->eliminated_tables
Please try doing this.
> diff --git a/mysql-test/main/table_elim.result b/mysql-test/main/table_elim.result
> index deff0623370..842330f3713 100644
> --- a/mysql-test/main/table_elim.result
> +++ b/mysql-test/main/table_elim.result
> @@ -704,3 +704,252 @@ LIMIT 1;
> PostID Voted
> 1 NULL
> DROP TABLE t1,t2;
> +#
> +# MDEV-26278: Table elimination does not work across derived tables
> +#
> +create table t1 (a int, b int);
> +insert into t1 select seq, seq+10 from seq_1_to_10;
> +create table t11 (
> +a int not null,
> +b int,
> +key(a)
> +);
> +insert into t11 select A.seq, A.seq+B.seq
> +from
> +seq_1_to_100 A,
> +seq_1_to_1000 B;
Do you really need 100K rows table? I think 10K rows will be just as good.
> diff --git a/sql/opt_table_elimination.cc b/sql/opt_table_elimination.cc
> index 8c4720bdec4..fe2155191ca 100644
> --- a/sql/opt_table_elimination.cc
> +++ b/sql/opt_table_elimination.cc
...
> @@ -278,6 +286,8 @@ class Dep_value_field : public Dep_value
> Dep_module_key *key_dep;
> /* Otherwise, this and advance */
> uint equality_no;
> +
> + Dep_module_synthetic_key *synth_key_dep;
Please add a comment for the above that we should try that one, too.
@@ -447,6 +463,55 @@ const size_t Dep_module::iterator_size=
> MY_MAX(Dep_module_expr::iterator_size, Dep_module_key::iterator_size);
>
>
> +/* A synthetic unique key module for a derived table.
> + For example, a derived table
> + SELECT t11.a, count(*) from t1 LEFT JOIN t2 ON t1.id = t2.fk GROUP BY t11.a
> + has unique values in its first field (t11.a) due to GROUP BY expression
> + so this can be considered as a unique key for this derived table
> +*/
> +
> +class Dep_module_synthetic_key : public Dep_module
> +{
> +public:
> + Dep_module_synthetic_key(Dep_value_table *table_arg,
> + std::vector<field_index_t>&& field_indexes)
> + : table(table_arg), derived_table_field_indexes(field_indexes)
> + {
> + unbound_args= static_cast<uint>(field_indexes.size());
> + }
> +
> + Dep_value_table * table;
Coding style: "*table".
...
> @@ -1603,7 +1674,73 @@ Dep_value_table *Dep_analysis_context::create_table_value(TABLE *table)
> key_list= &(key_dep->next_table_key);
> }
> }
> - return table_deps[table->tablenr]= tbl_dep;
> +
> + auto select_unit= table_list->get_unit();
> + SELECT_LEX *first_select= nullptr;
> + if (select_unit)
Please move the code that creates the pseudo-key into a separate function.
...
> +
> +int Dep_analysis_context::find_field_in_list(List<Item> &fields_list,
> + Item *field)
Please add a one-line comment describing what function does.
> @@ -1786,6 +1941,18 @@ Dep_module* Dep_value_field::get_next_unbound_module(Dep_analysis_context *dac,
> }
> else
> di->key_dep= NULL;
> +
> + Dep_module_synthetic_key *synth_key_dep= di->synth_key_dep;
> + if (synth_key_dep && !synth_key_dep->is_applicable() &&
> + std::find(synth_key_dep->derived_table_field_indexes.begin(),
> + synth_key_dep->derived_table_field_indexes.end(),
> + field->field_index) != std::end(synth_key_dep->derived_table_field_indexes))
Please introduce a method like
bool Dep_module_synthetic_key::covers_field(int field_index);
Will this also allow to make
Dep_module_synthetic_key::derived_table_field_indexes a private member?
BR
Sergei
--
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://petrunia.net
Follow ups