maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #02710
Updated (by Psergey): Subqueries: Inside-out execution for non-semijoin materialized subqueries that are AND-parts of the WHERE (90)
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Subqueries: Inside-out execution for non-semijoin materialized
subqueries that are AND-parts of the WHERE
CREATION DATE..: Sun, 28 Feb 2010, 13:45
SUPERVISOR.....: Monty
IMPLEMENTOR....: Psergey
COPIES TO......: Igor, Psergey, Timour
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 90 (http://askmonty.org/worklog/?tid=90)
VERSION........: Server-5.3
STATUS.........: Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: -1 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Psergey - Wed, 24 Mar 2010, 14:42)=-=-
Low Level Design modified.
--- /tmp/wklog.90.old.19182 2010-03-24 14:42:54.000000000 +0000
+++ /tmp/wklog.90.new.19182 2010-03-24 14:42:54.000000000 +0000
@@ -1 +1,140 @@
+<contents>
+1. Applicability check
+2. Representation
+2.1 Option #1: Convert to TABLE_LIST
+2.2 On subquery predicate removal
+2.3 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
+2.3 What is expected of the result of conversion
+3. Pre-optimization steps
+3.1 Constant detection
+3.3 update_ref_and_keys
+3.4 JOIN_TAB sorting criteria
+4. Optimization
+5. Execution
+User interface.
+</contents>
+
+We'll call the new execution strategy "jtbm-materialization", for the lack of
+better name.
+
+1. Applicability check
+======================
+The criteria for checking whether a subquery can be processed with
+jtbm-materialization can be checked at JOIN::prepare stage (like it
+happens with semi-join check)
+
+2. Representation
+=================
+
+2.1 Option #1: Convert to TABLE_LIST
+------------------------------------
+Make it work like semi-join nests: each jtbm-predicate is converted into a
+TABLE_LIST object. This will make it
+
+ - uniform with semi-joins (we've stepped on all rakes there)
+ - allow to process JTBM-subqueries in ON expressions
+
+simplify_joins() will handle jtbm TABLE_LISTs as some kinds of opaque base
+tables.
+
+for EXPLAIN EXTENDED, it would be natural to print something semi-join like,
+i.e. for
+
+ SELECT ... FROM ot WHERE oe IN (SELECT ie FROM materialized-non-sj-select)
+
+we'll print
+
+ SELECT ... FROM ot SJ (SELECT ie FROM materialized-non-sj-select) ON oe=XX
+
+the XX part is not clear. we don't want to print 'ie' the second time here?
+
+2.2 On subquery predicate removal
+---------------------------------
+Q: if we remove the subquery predicate permanently, who will run
+fix_fields() for it? For semi-joins we don't have the problem as we
+inject into ON expression (right? or not? we have sj_on_expr, too...
+(Investigation: we the the same Item* pointer both in WHERE and
+as sj_on_expr. fix_fields() is called for the WHERE part and that's
+how sj_on_expr gets fixed. This works as long as
+Item_func_eq::fix_fields() does not try to substitute itself with
+another item).
+A: ?
+
+2.3 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
+------------------------------------------------------------
+JOIN_TABs only live for the duration of one PS re-execution, so we'll have to
+- make conversion fully undoable
+- perform it sufficiently late in the optimization process, at the point
+ where JOIN_TABs are already allocated.
+
+Note that if we don't position JTBM predicates in join's TABLE_LIST tree, then
+it will be impossible to handle JTBM queries inside/outside of outer joins.
+
+2.3 What is expected of the result of conversion
+------------------------------------------------
+Join [pre]optimization relies on each optimized entity to have a bit in
+table_map.
+
+TODO: where do we check if there will be enough bits for everyone? (at the
+ point where we assign them?)
+
+The bit stored in join_tab->table->map, and the apparent problem is that JTBM
+join_tabs do not naturally have TABLE* object.
+
+We could use the the one that will be used for Materialization, but that will
+stop working when we will have to include IN->EXISTS in the choice.
+
+Current approach: don't create a table. create a table_map element in JOIN_TAB
+instead. Evgen has probably done something like that already.
+
+3. Pre-optimization steps
+=========================
+JOIN_TABs are allocated in make_join_statistics(). This where the changes will
+be needed: for JOIN_TABs that correspond to JTBM-tables:
+
+- don't set tab->table, set tab->jtbm_select (or whatever)
+- run subquery's optimizer to get its output cardinality
+
+3.1 Constant detection
+----------------------
+What about subqueries that "are constant"?
+ const_item IN (SELECT uncorrelated) -> is constant, but not something
+ we would want to evaluate.
+ something IN (SELECT from_constant_join) -> is constant
+
+Do we need to mark their JOIN_TABs as constant?
+
+3.3 update_ref_and_keys
+-----------------------
+* Walk through JTBM elements and inject KEYUSE elements for their
+ IN-equalities.
+
+TODO: KEYUSE elements imply presense of KEYs! Which we don't have!
+
+3.4 JOIN_TAB sorting criteria
+-----------------------------
+Q: Where do we put JTBM's join_tab when pre-sorting records?
+A: it should sort as regular table.
+
+TODO: where do we remove the predicates from the WHERE?
+ - remove them like SJ-converter does
+ - remove them with optimizer (like remove_eq_conds does)
+
+4. Optimization
+===============
+Add a branch in best_access_path to account for
+- JTBM-Materialization
+- JTBM-Materialization-Scan.
+
+5. Execution
+============
+* We should be able to reuse item_subselect.cc code for lookups
+* But will have to use our own temptable scan code
+
+TODO: is it possible to have any unification with SJ-Materialization?
+
+User interface
+--------------
+Any @@optimizer_switch flags for all this?
+
-=-=(Igor - Wed, 10 Mar 2010, 22:02)=-=-
High Level Description modified.
--- /tmp/wklog.90.old.2007 2010-03-10 22:02:23.000000000 +0000
+++ /tmp/wklog.90.new.2007 2010-03-10 22:02:23.000000000 +0000
@@ -13,8 +13,8 @@
for each record R2 in big_table such that oe=R1
pass R2 to output
-Semi-join materialization supports such strategy with SJM-Scan strategy. This WL
-entry is about adding support for such strategies for non-semijoin subqueries.
+Semi-join materialization supports the inside-out strategy. This WL entry is
+about adding support for such strategies for non-semijoin subqueries.
Once WL#89 is done, there will be a cost-based choice between
-=-=(Igor - Wed, 10 Mar 2010, 21:52)=-=-
Status updated.
--- /tmp/wklog.90.old.882 2010-03-10 21:52:02.000000000 +0000
+++ /tmp/wklog.90.new.882 2010-03-10 21:52:02.000000000 +0000
@@ -1 +1 @@
-Un-Assigned
+Assigned
-=-=(Psergey - Sun, 28 Feb 2010, 15:37)=-=-
High Level Description modified.
--- /tmp/wklog.90.old.23524 2010-02-28 15:37:47.000000000 +0000
+++ /tmp/wklog.90.new.23524 2010-02-28 15:37:47.000000000 +0000
@@ -15,3 +15,7 @@
Semi-join materialization supports such strategy with SJM-Scan strategy. This WL
entry is about adding support for such strategies for non-semijoin subqueries.
+
+
+Once WL#89 is done, there will be a cost-based choice between
+Materialization+lookup, Materialization+scan, and IN->EXISTS+lookup strategies.
-=-=(Psergey - Sun, 28 Feb 2010, 15:22)=-=-
High-Level Specification modified.
--- /tmp/wklog.90.old.23033 2010-02-28 15:22:09.000000000 +0000
+++ /tmp/wklog.90.new.23033 2010-02-28 15:22:09.000000000 +0000
@@ -1 +1,33 @@
+Basic idea on how this could be achieved:
+
+Pre-optimization phase
+----------------------
+
+The rewrite
+~~~~~~~~~~~
+If we find a subquery predicate that is
+- not processed by current semi-join optimizations
+- is an AND-part of the WHERE/ON clause
+- can be executed with Materialization
+
+then
+- Remove the predicate from WHERE/ON clause
+- Add a special JOIN_TAB object instead.
+
+Plan options
+~~~~~~~~~~~~
+- Use the IN-equality to create KEYUSE elements.
+
+Optimization
+------------
+- Pre-optimize the subquery so we know materialization cost
+- Whenever best_access_path() encounters the "special JOIN_TAB" it should
+ consider two strategies:
+ A. Materialization and making lookups in the materialized table (if applicable)
+ B. Materialization and then scanning the materialized table.
+
+
+EXPLAIN
+-------
+TODO how this will look in EXPLAIN output?
-=-=(Psergey - Sun, 28 Feb 2010, 14:56)=-=-
Dependency created: 91 now depends on 90
-=-=(Psergey - Sun, 28 Feb 2010, 14:54)=-=-
Dependency deleted: 94 no longer depends on 90
-=-=(Psergey - Sun, 28 Feb 2010, 14:47)=-=-
Title modified.
--- /tmp/wklog.90.old.21903 2010-02-28 14:47:54.000000000 +0000
+++ /tmp/wklog.90.new.21903 2010-02-28 14:47:54.000000000 +0000
@@ -1 +1 @@
- Subqueries: Inside-out execution for non-semijoin materialized subqueries that are AND-parts of the WHERE
+Subqueries: Inside-out execution for non-semijoin materialized subqueries that are AND-parts of the WHERE
-=-=(Psergey - Sun, 28 Feb 2010, 14:47)=-=-
High Level Description modified.
--- /tmp/wklog.90.old.21880 2010-02-28 14:47:28.000000000 +0000
+++ /tmp/wklog.90.new.21880 2010-02-28 14:47:28.000000000 +0000
@@ -1,10 +1,17 @@
-For uncorrelated IN subqueries that can't be converted to semi-joins it is
-necessary to make a cost-based choice between IN->EXISTS and Materialization
-strategies.
+Consider the following case:
-Both strategies handle two cases:
-1. A simple case w/o NULLs handling
-2. Handling NULLs.
+SELECT * FROM big_table
+WHERE oe IN (SELECT ie FROM table_with_few_groups
+ WHERE ...
+ GROUP BY group_col) AND ...
-This WL is about making cost-based decision for #1.
+Here the best way to execute the query is:
+ Materialize the subquery;
+ # now run the join:
+ for each record R1 in materialized table
+ for each record R2 in big_table such that oe=R1
+ pass R2 to output
+
+Semi-join materialization supports such strategy with SJM-Scan strategy. This WL
+entry is about adding support for such strategies for non-semijoin subqueries.
-=-=(Psergey - Sun, 28 Feb 2010, 14:47)=-=-
Title modified.
--- /tmp/wklog.90.old.21859 2010-02-28 14:47:02.000000000 +0000
+++ /tmp/wklog.90.new.21859 2010-02-28 14:47:02.000000000 +0000
@@ -1 +1 @@
-Subqueries: cost-based choice between Materialization and IN->EXISTS transformation
+ Subqueries: Inside-out execution for non-semijoin materialized subqueries that are AND-parts of the WHERE
------------------------------------------------------------
-=-=(View All Progress Notes, 11 total)=-=-
http://askmonty.org/worklog/index.pl?tid=90&nolimit=1
DESCRIPTION:
Consider the following case:
SELECT * FROM big_table
WHERE oe IN (SELECT ie FROM table_with_few_groups
WHERE ...
GROUP BY group_col) AND ...
Here the best way to execute the query is:
Materialize the subquery;
# now run the join:
for each record R1 in materialized table
for each record R2 in big_table such that oe=R1
pass R2 to output
Semi-join materialization supports the inside-out strategy. This WL entry is
about adding support for such strategies for non-semijoin subqueries.
Once WL#89 is done, there will be a cost-based choice between
Materialization+lookup, Materialization+scan, and IN->EXISTS+lookup strategies.
HIGH-LEVEL SPECIFICATION:
Basic idea on how this could be achieved:
Pre-optimization phase
----------------------
The rewrite
~~~~~~~~~~~
If we find a subquery predicate that is
- not processed by current semi-join optimizations
- is an AND-part of the WHERE/ON clause
- can be executed with Materialization
then
- Remove the predicate from WHERE/ON clause
- Add a special JOIN_TAB object instead.
Plan options
~~~~~~~~~~~~
- Use the IN-equality to create KEYUSE elements.
Optimization
------------
- Pre-optimize the subquery so we know materialization cost
- Whenever best_access_path() encounters the "special JOIN_TAB" it should
consider two strategies:
A. Materialization and making lookups in the materialized table (if applicable)
B. Materialization and then scanning the materialized table.
EXPLAIN
-------
TODO how this will look in EXPLAIN output?
LOW-LEVEL DESIGN:
<contents>
1. Applicability check
2. Representation
2.1 Option #1: Convert to TABLE_LIST
2.2 On subquery predicate removal
2.3 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
2.3 What is expected of the result of conversion
3. Pre-optimization steps
3.1 Constant detection
3.3 update_ref_and_keys
3.4 JOIN_TAB sorting criteria
4. Optimization
5. Execution
User interface.
</contents>
We'll call the new execution strategy "jtbm-materialization", for the lack of
better name.
1. Applicability check
======================
The criteria for checking whether a subquery can be processed with
jtbm-materialization can be checked at JOIN::prepare stage (like it
happens with semi-join check)
2. Representation
=================
2.1 Option #1: Convert to TABLE_LIST
------------------------------------
Make it work like semi-join nests: each jtbm-predicate is converted into a
TABLE_LIST object. This will make it
- uniform with semi-joins (we've stepped on all rakes there)
- allow to process JTBM-subqueries in ON expressions
simplify_joins() will handle jtbm TABLE_LISTs as some kinds of opaque base
tables.
for EXPLAIN EXTENDED, it would be natural to print something semi-join like,
i.e. for
SELECT ... FROM ot WHERE oe IN (SELECT ie FROM materialized-non-sj-select)
we'll print
SELECT ... FROM ot SJ (SELECT ie FROM materialized-non-sj-select) ON oe=XX
the XX part is not clear. we don't want to print 'ie' the second time here?
2.2 On subquery predicate removal
---------------------------------
Q: if we remove the subquery predicate permanently, who will run
fix_fields() for it? For semi-joins we don't have the problem as we
inject into ON expression (right? or not? we have sj_on_expr, too...
(Investigation: we the the same Item* pointer both in WHERE and
as sj_on_expr. fix_fields() is called for the WHERE part and that's
how sj_on_expr gets fixed. This works as long as
Item_func_eq::fix_fields() does not try to substitute itself with
another item).
A: ?
2.3 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
------------------------------------------------------------
JOIN_TABs only live for the duration of one PS re-execution, so we'll have to
- make conversion fully undoable
- perform it sufficiently late in the optimization process, at the point
where JOIN_TABs are already allocated.
Note that if we don't position JTBM predicates in join's TABLE_LIST tree, then
it will be impossible to handle JTBM queries inside/outside of outer joins.
2.3 What is expected of the result of conversion
------------------------------------------------
Join [pre]optimization relies on each optimized entity to have a bit in
table_map.
TODO: where do we check if there will be enough bits for everyone? (at the
point where we assign them?)
The bit stored in join_tab->table->map, and the apparent problem is that JTBM
join_tabs do not naturally have TABLE* object.
We could use the the one that will be used for Materialization, but that will
stop working when we will have to include IN->EXISTS in the choice.
Current approach: don't create a table. create a table_map element in JOIN_TAB
instead. Evgen has probably done something like that already.
3. Pre-optimization steps
=========================
JOIN_TABs are allocated in make_join_statistics(). This where the changes will
be needed: for JOIN_TABs that correspond to JTBM-tables:
- don't set tab->table, set tab->jtbm_select (or whatever)
- run subquery's optimizer to get its output cardinality
3.1 Constant detection
----------------------
What about subqueries that "are constant"?
const_item IN (SELECT uncorrelated) -> is constant, but not something
we would want to evaluate.
something IN (SELECT from_constant_join) -> is constant
Do we need to mark their JOIN_TABs as constant?
3.3 update_ref_and_keys
-----------------------
* Walk through JTBM elements and inject KEYUSE elements for their
IN-equalities.
TODO: KEYUSE elements imply presense of KEYs! Which we don't have!
3.4 JOIN_TAB sorting criteria
-----------------------------
Q: Where do we put JTBM's join_tab when pre-sorting records?
A: it should sort as regular table.
TODO: where do we remove the predicates from the WHERE?
- remove them like SJ-converter does
- remove them with optimizer (like remove_eq_conds does)
4. Optimization
===============
Add a branch in best_access_path to account for
- JTBM-Materialization
- JTBM-Materialization-Scan.
5. Execution
============
* We should be able to reuse item_subselect.cc code for lookups
* But will have to use our own temptable scan code
TODO: is it possible to have any unification with SJ-Materialization?
User interface
--------------
Any @@optimizer_switch flags for all this?
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)