maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #02781
Updated (by Psergey): Subqueries: Inside-out execution for non-semijoin materialized subqueries that are AND-parts of the WHERE (90)
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Subqueries: Inside-out execution for non-semijoin materialized
subqueries that are AND-parts of the WHERE
CREATION DATE..: Sun, 28 Feb 2010, 13:45
SUPERVISOR.....: Monty
IMPLEMENTOR....: Psergey
COPIES TO......: Igor, Psergey, Timour
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 90 (http://askmonty.org/worklog/?tid=90)
VERSION........: Server-5.3
STATUS.........: Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: -1 (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Psergey - Mon, 29 Mar 2010, 20:02)=-=-
Low Level Design modified.
--- /tmp/wklog.90.old.27113 2010-03-29 20:02:22.000000000 +0000
+++ /tmp/wklog.90.new.27113 2010-03-29 20:02:22.000000000 +0000
@@ -2,16 +2,15 @@
1. Applicability check
2. Representation
2.1 Option #1: Convert to TABLE_LIST
-2.2 On subquery predicate removal
-2.3 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
-2.3 What is expected of the result of conversion
+2.1.1 On subquery predicate removal
+2.2 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
+2.3 Conversion process and its result
3. Pre-optimization steps
-3.1 Constant detection
-3.3 update_ref_and_keys
-3.4 JOIN_TAB sorting criteria
-4. Optimization
+3.1 update_ref_and_keys
+3.2 Constant detection
+4. Join optimization
5. Execution
-User interface.
+6. User interface
</contents>
We'll call the new execution strategy "jtbm-materialization", for the lack of
@@ -19,48 +18,60 @@
1. Applicability check
======================
-The criteria for checking whether a subquery can be processed with
-jtbm-materialization can be checked at JOIN::prepare stage (like it
-happens with semi-join check)
+Applicability criteria of jtbm-materialization can be checked at
+JOIN::prepare() stage (same as with with semi-join applicability checks)
2. Representation
=================
2.1 Option #1: Convert to TABLE_LIST
------------------------------------
-Make it work like semi-join nests: each jtbm-predicate is converted into a
-TABLE_LIST object. This will make it
+Make it work like semi-join nests: a jtbm-predicate is converted into a
+TABLE_LIST object that is injected into select's join tree. This will make it
- - uniform with semi-joins (we've stepped on all rakes there)
- - allow to process JTBM-subqueries in ON expressions
+ - uniform with semi-join processing (and we've stepped on all rakes there)
+ - allow to process JTBM-subqueries in LEFT JOINs' ON expressions
+ (as TABLE_LIST hierarchy is what is used to encode outer joins)
simplify_joins() will handle jtbm TABLE_LISTs as some kinds of opaque base
tables.
-for EXPLAIN EXTENDED, it would be natural to print something semi-join like,
-i.e. for
+EXPLAIN EXTENDED will display the conversion result as join, that is, for
+subquery like
SELECT ... FROM ot WHERE oe IN (SELECT ie FROM materialized-non-sj-select)
-we'll print
+it will print
- SELECT ... FROM ot SJ (SELECT ie FROM materialized-non-sj-select) ON oe=XX
+ SELECT ... FROM ot, <materialize>(SELECT ie FROM materialized-non-sj-select)
-the XX part is not clear. we don't want to print 'ie' the second time here?
+After we've added a TABLE_LIST element, we'll need to remove the subquery
+predicate from the WHERE/On clause, in the same way as SJ-converter does it.
-2.2 On subquery predicate removal
----------------------------------
-Q: if we remove the subquery predicate permanently, who will run
-fix_fields() for it? For semi-joins we don't have the problem as we
-inject into ON expression (right? or not? we have sj_on_expr, too...
-(Investigation: we the the same Item* pointer both in WHERE and
-as sj_on_expr. fix_fields() is called for the WHERE part and that's
-how sj_on_expr gets fixed. This works as long as
-Item_func_eq::fix_fields() does not try to substitute itself with
-another item).
+TODO: note: we inject a new kind of base table and then next PS re-execution
+will re-run name resolution/etc on TABLE_LIST tree that includes this table.
+We'll need to make sure that materialized-table doesn't interfere with
+anything.
+
+TODO: Separation of steps that need to be done once and steps that need to be
+re-done for every PS execution.
+
+2.1.1 On subquery predicate removal
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Q: if we remove the subquery predicate permanently, who will call
+fix_fields() for the oe=ie IN-equality? For semi-joins we don't have this
+problem as we inject into ON expression
+
+(right? or not? we have sj_on_expr, too...
+ (Investigation: we the the same Item* pointer both in WHERE and
+ as sj_on_expr. fix_fields() is called for the WHERE part and that's
+ how sj_on_expr gets fixed. This works as long as
+ Item_func_eq::fix_fields() does not try to substitute itself with
+ another item)
+)
A: ?
-2.3 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
+2.2 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
------------------------------------------------------------
JOIN_TABs only live for the duration of one PS re-execution, so we'll have to
- make conversion fully undoable
@@ -70,71 +81,96 @@
Note that if we don't position JTBM predicates in join's TABLE_LIST tree, then
it will be impossible to handle JTBM queries inside/outside of outer joins.
-2.3 What is expected of the result of conversion
-------------------------------------------------
-Join [pre]optimization relies on each optimized entity to have a bit in
-table_map.
-
-TODO: where do we check if there will be enough bits for everyone? (at the
- point where we assign them?)
+Note: if we convert that late, we could remove the subquery predicate in the
+same way as remove_eq_conds() removes known-to-be-false AND-parts.
+This operation is undoable.
+
+2.3 Conversion process and its result
+-------------------------------------
+At the moment, the following is being implemented:
+
+* convert_join_subqueries_to_semijoins() handles both conversions to
+ semi-joins (with convert_subq_to_sj()) and to JTBM nests (with
+ convert_subq_to_jtbm()).
+ - This means the process is done in bottom-up way (we walk down into
+ semi-join subselects but not jtbm-selects), with protection against
+ exhaustion of table bits beyond #MAX_TABLES, etc.
+
+* convert_subq_to_jtbm() invokes
+
+ subq_predicate->setup_engine()
+
+ which creates the temporary table, and then we use it as the TABLE* object
+ that we put into TABLE_LIST::table, JOIN_TAB::table, etc.
+ One of the first actions is that we assign it a bit and store it in
+ table->map.
+ (Note that this won't work when we add a choice between Materialization and
+ IN->EXISTS)
-The bit stored in join_tab->table->map, and the apparent problem is that JTBM
-join_tabs do not naturally have TABLE* object.
+ (Another option is do like done in FROM optimization: create a 'preliminary'
+ TABLE* object which has only some of the fields filled in).
-We could use the the one that will be used for Materialization, but that will
-stop working when we will have to include IN->EXISTS in the choice.
-
-Current approach: don't create a table. create a table_map element in JOIN_TAB
-instead. Evgen has probably done something like that already.
+QQ: Do we create/inject IN-equality? why not.. it will be used everywhere
+ where needed...
3. Pre-optimization steps
=========================
-JOIN_TABs are allocated in make_join_statistics(). This where the changes will
-be needed: for JOIN_TABs that correspond to JTBM-tables:
-
-- don't set tab->table, set tab->jtbm_select (or whatever)
+JOIN_TABs are allocated in make_join_statistics(). For JTBM join_tabs we'll
+need:
- run subquery's optimizer to get its output cardinality
+- todo what else?
+
+3.1 update_ref_and_keys
+-----------------------
+Since we've injected the equality into WHERE/ON and created a TABLE* with
+appropriate index no special steps are necessary. The regular procedure will
+add the needed KEYUSE elements.
-3.1 Constant detection
+3.2 Constant detection
----------------------
-What about subqueries that "are constant"?
- const_item IN (SELECT uncorrelated) -> is constant, but not something
- we would want to evaluate.
- something IN (SELECT from_constant_join) -> is constant
+* We should not let const table detection code to look at the materialized
+ table, find it to be empty and mark it as constant.
-Do we need to mark their JOIN_TABs as constant?
+* We also should disallow eq_ref(const)-based detection for materialized table
+ That is, if outer_expr is found to be constant, the code should not attempt
+ to make a eq_ref(const) lookup in the materialized table.
-3.3 update_ref_and_keys
------------------------
-* Walk through JTBM elements and inject KEYUSE elements for their
- IN-equalities.
+TODO {
+ Second: What about subqueries that "are constant"?
+ const_item IN (SELECT uncorrelated) -> is constant, but not something
+ we would want to evaluate at optimization phase.
+
+ something IN (SELECT from_constant_join) -> is constant
+ Do we need to mark their JOIN_TABs as constant?)
+}
-TODO: KEYUSE elements imply presense of KEYs! Which we don't have!
+4. Join optimization
+====================
+Should just work, with exception that we need to make best_access_path not
+to use table->stats.records but use JOIN_TAB::records instead (which already
+exists and is filled)
-3.4 JOIN_TAB sorting criteria
------------------------------
-Q: Where do we put JTBM's join_tab when pre-sorting records?
-A: it should sort as regular table.
-
-TODO: where do we remove the predicates from the WHERE?
- - remove them like SJ-converter does
- - remove them with optimizer (like remove_eq_conds does)
-
-4. Optimization
-===============
-Add a branch in best_access_path to account for
-- JTBM-Materialization
-- JTBM-Materialization-Scan.
+NOTE: FROM-optimization does set table->stats.records. Whose approach is
+ wrong?
5. Execution
============
-* We should be able to reuse item_subselect.cc code for lookups
-* But will have to use our own temptable scan code
+We need that:
+1. Materialization operation (i.e. execute the subquery and fill the
+ materialized temptable) is done at start of execution
+2. Materialized table is accessed during execution (either by doing lookups or
+ full table scan)
+3. IN-equality is checked at the right moment.
+
+#2 is achieved by putting the materialized table into join_tab->table.
+#3 is achieved by injecting IN-equality into the WHERE.
-TODO: is it possible to have any unification with SJ-Materialization?
+for #1, we need to place a call that fills the temptable at the right location.
-User interface
---------------
-Any @@optimizer_switch flags for all this?
+6. User interface
+=================
+No @@optimizer_switch flags are planned at the moment. JTBM-materialization
+will be controlled by existing materialization=on|off switch.
+We could also add a generic materialization_scan=on|off switch.
-=-=(Psergey - Wed, 24 Mar 2010, 14:42)=-=-
Low Level Design modified.
--- /tmp/wklog.90.old.19182 2010-03-24 14:42:54.000000000 +0000
+++ /tmp/wklog.90.new.19182 2010-03-24 14:42:54.000000000 +0000
@@ -1 +1,140 @@
+<contents>
+1. Applicability check
+2. Representation
+2.1 Option #1: Convert to TABLE_LIST
+2.2 On subquery predicate removal
+2.3 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
+2.3 What is expected of the result of conversion
+3. Pre-optimization steps
+3.1 Constant detection
+3.3 update_ref_and_keys
+3.4 JOIN_TAB sorting criteria
+4. Optimization
+5. Execution
+User interface.
+</contents>
+
+We'll call the new execution strategy "jtbm-materialization", for the lack of
+better name.
+
+1. Applicability check
+======================
+The criteria for checking whether a subquery can be processed with
+jtbm-materialization can be checked at JOIN::prepare stage (like it
+happens with semi-join check)
+
+2. Representation
+=================
+
+2.1 Option #1: Convert to TABLE_LIST
+------------------------------------
+Make it work like semi-join nests: each jtbm-predicate is converted into a
+TABLE_LIST object. This will make it
+
+ - uniform with semi-joins (we've stepped on all rakes there)
+ - allow to process JTBM-subqueries in ON expressions
+
+simplify_joins() will handle jtbm TABLE_LISTs as some kinds of opaque base
+tables.
+
+for EXPLAIN EXTENDED, it would be natural to print something semi-join like,
+i.e. for
+
+ SELECT ... FROM ot WHERE oe IN (SELECT ie FROM materialized-non-sj-select)
+
+we'll print
+
+ SELECT ... FROM ot SJ (SELECT ie FROM materialized-non-sj-select) ON oe=XX
+
+the XX part is not clear. we don't want to print 'ie' the second time here?
+
+2.2 On subquery predicate removal
+---------------------------------
+Q: if we remove the subquery predicate permanently, who will run
+fix_fields() for it? For semi-joins we don't have the problem as we
+inject into ON expression (right? or not? we have sj_on_expr, too...
+(Investigation: we the the same Item* pointer both in WHERE and
+as sj_on_expr. fix_fields() is called for the WHERE part and that's
+how sj_on_expr gets fixed. This works as long as
+Item_func_eq::fix_fields() does not try to substitute itself with
+another item).
+A: ?
+
+2.3 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
+------------------------------------------------------------
+JOIN_TABs only live for the duration of one PS re-execution, so we'll have to
+- make conversion fully undoable
+- perform it sufficiently late in the optimization process, at the point
+ where JOIN_TABs are already allocated.
+
+Note that if we don't position JTBM predicates in join's TABLE_LIST tree, then
+it will be impossible to handle JTBM queries inside/outside of outer joins.
+
+2.3 What is expected of the result of conversion
+------------------------------------------------
+Join [pre]optimization relies on each optimized entity to have a bit in
+table_map.
+
+TODO: where do we check if there will be enough bits for everyone? (at the
+ point where we assign them?)
+
+The bit stored in join_tab->table->map, and the apparent problem is that JTBM
+join_tabs do not naturally have TABLE* object.
+
+We could use the the one that will be used for Materialization, but that will
+stop working when we will have to include IN->EXISTS in the choice.
+
+Current approach: don't create a table. create a table_map element in JOIN_TAB
+instead. Evgen has probably done something like that already.
+
+3. Pre-optimization steps
+=========================
+JOIN_TABs are allocated in make_join_statistics(). This where the changes will
+be needed: for JOIN_TABs that correspond to JTBM-tables:
+
+- don't set tab->table, set tab->jtbm_select (or whatever)
+- run subquery's optimizer to get its output cardinality
+
+3.1 Constant detection
+----------------------
+What about subqueries that "are constant"?
+ const_item IN (SELECT uncorrelated) -> is constant, but not something
+ we would want to evaluate.
+ something IN (SELECT from_constant_join) -> is constant
+
+Do we need to mark their JOIN_TABs as constant?
+
+3.3 update_ref_and_keys
+-----------------------
+* Walk through JTBM elements and inject KEYUSE elements for their
+ IN-equalities.
+
+TODO: KEYUSE elements imply presense of KEYs! Which we don't have!
+
+3.4 JOIN_TAB sorting criteria
+-----------------------------
+Q: Where do we put JTBM's join_tab when pre-sorting records?
+A: it should sort as regular table.
+
+TODO: where do we remove the predicates from the WHERE?
+ - remove them like SJ-converter does
+ - remove them with optimizer (like remove_eq_conds does)
+
+4. Optimization
+===============
+Add a branch in best_access_path to account for
+- JTBM-Materialization
+- JTBM-Materialization-Scan.
+
+5. Execution
+============
+* We should be able to reuse item_subselect.cc code for lookups
+* But will have to use our own temptable scan code
+
+TODO: is it possible to have any unification with SJ-Materialization?
+
+User interface
+--------------
+Any @@optimizer_switch flags for all this?
+
-=-=(Igor - Wed, 10 Mar 2010, 22:02)=-=-
High Level Description modified.
--- /tmp/wklog.90.old.2007 2010-03-10 22:02:23.000000000 +0000
+++ /tmp/wklog.90.new.2007 2010-03-10 22:02:23.000000000 +0000
@@ -13,8 +13,8 @@
for each record R2 in big_table such that oe=R1
pass R2 to output
-Semi-join materialization supports such strategy with SJM-Scan strategy. This WL
-entry is about adding support for such strategies for non-semijoin subqueries.
+Semi-join materialization supports the inside-out strategy. This WL entry is
+about adding support for such strategies for non-semijoin subqueries.
Once WL#89 is done, there will be a cost-based choice between
-=-=(Igor - Wed, 10 Mar 2010, 21:52)=-=-
Status updated.
--- /tmp/wklog.90.old.882 2010-03-10 21:52:02.000000000 +0000
+++ /tmp/wklog.90.new.882 2010-03-10 21:52:02.000000000 +0000
@@ -1 +1 @@
-Un-Assigned
+Assigned
-=-=(Psergey - Sun, 28 Feb 2010, 15:37)=-=-
High Level Description modified.
--- /tmp/wklog.90.old.23524 2010-02-28 15:37:47.000000000 +0000
+++ /tmp/wklog.90.new.23524 2010-02-28 15:37:47.000000000 +0000
@@ -15,3 +15,7 @@
Semi-join materialization supports such strategy with SJM-Scan strategy. This WL
entry is about adding support for such strategies for non-semijoin subqueries.
+
+
+Once WL#89 is done, there will be a cost-based choice between
+Materialization+lookup, Materialization+scan, and IN->EXISTS+lookup strategies.
-=-=(Psergey - Sun, 28 Feb 2010, 15:22)=-=-
High-Level Specification modified.
--- /tmp/wklog.90.old.23033 2010-02-28 15:22:09.000000000 +0000
+++ /tmp/wklog.90.new.23033 2010-02-28 15:22:09.000000000 +0000
@@ -1 +1,33 @@
+Basic idea on how this could be achieved:
+
+Pre-optimization phase
+----------------------
+
+The rewrite
+~~~~~~~~~~~
+If we find a subquery predicate that is
+- not processed by current semi-join optimizations
+- is an AND-part of the WHERE/ON clause
+- can be executed with Materialization
+
+then
+- Remove the predicate from WHERE/ON clause
+- Add a special JOIN_TAB object instead.
+
+Plan options
+~~~~~~~~~~~~
+- Use the IN-equality to create KEYUSE elements.
+
+Optimization
+------------
+- Pre-optimize the subquery so we know materialization cost
+- Whenever best_access_path() encounters the "special JOIN_TAB" it should
+ consider two strategies:
+ A. Materialization and making lookups in the materialized table (if applicable)
+ B. Materialization and then scanning the materialized table.
+
+
+EXPLAIN
+-------
+TODO how this will look in EXPLAIN output?
-=-=(Psergey - Sun, 28 Feb 2010, 14:56)=-=-
Dependency created: 91 now depends on 90
-=-=(Psergey - Sun, 28 Feb 2010, 14:54)=-=-
Dependency deleted: 94 no longer depends on 90
-=-=(Psergey - Sun, 28 Feb 2010, 14:47)=-=-
Title modified.
--- /tmp/wklog.90.old.21903 2010-02-28 14:47:54.000000000 +0000
+++ /tmp/wklog.90.new.21903 2010-02-28 14:47:54.000000000 +0000
@@ -1 +1 @@
- Subqueries: Inside-out execution for non-semijoin materialized subqueries that are AND-parts of the WHERE
+Subqueries: Inside-out execution for non-semijoin materialized subqueries that are AND-parts of the WHERE
-=-=(Psergey - Sun, 28 Feb 2010, 14:47)=-=-
High Level Description modified.
--- /tmp/wklog.90.old.21880 2010-02-28 14:47:28.000000000 +0000
+++ /tmp/wklog.90.new.21880 2010-02-28 14:47:28.000000000 +0000
@@ -1,10 +1,17 @@
-For uncorrelated IN subqueries that can't be converted to semi-joins it is
-necessary to make a cost-based choice between IN->EXISTS and Materialization
-strategies.
+Consider the following case:
-Both strategies handle two cases:
-1. A simple case w/o NULLs handling
-2. Handling NULLs.
+SELECT * FROM big_table
+WHERE oe IN (SELECT ie FROM table_with_few_groups
+ WHERE ...
+ GROUP BY group_col) AND ...
-This WL is about making cost-based decision for #1.
+Here the best way to execute the query is:
+ Materialize the subquery;
+ # now run the join:
+ for each record R1 in materialized table
+ for each record R2 in big_table such that oe=R1
+ pass R2 to output
+
+Semi-join materialization supports such strategy with SJM-Scan strategy. This WL
+entry is about adding support for such strategies for non-semijoin subqueries.
------------------------------------------------------------
-=-=(View All Progress Notes, 12 total)=-=-
http://askmonty.org/worklog/index.pl?tid=90&nolimit=1
DESCRIPTION:
Consider the following case:
SELECT * FROM big_table
WHERE oe IN (SELECT ie FROM table_with_few_groups
WHERE ...
GROUP BY group_col) AND ...
Here the best way to execute the query is:
Materialize the subquery;
# now run the join:
for each record R1 in materialized table
for each record R2 in big_table such that oe=R1
pass R2 to output
Semi-join materialization supports the inside-out strategy. This WL entry is
about adding support for such strategies for non-semijoin subqueries.
Once WL#89 is done, there will be a cost-based choice between
Materialization+lookup, Materialization+scan, and IN->EXISTS+lookup strategies.
HIGH-LEVEL SPECIFICATION:
Basic idea on how this could be achieved:
Pre-optimization phase
----------------------
The rewrite
~~~~~~~~~~~
If we find a subquery predicate that is
- not processed by current semi-join optimizations
- is an AND-part of the WHERE/ON clause
- can be executed with Materialization
then
- Remove the predicate from WHERE/ON clause
- Add a special JOIN_TAB object instead.
Plan options
~~~~~~~~~~~~
- Use the IN-equality to create KEYUSE elements.
Optimization
------------
- Pre-optimize the subquery so we know materialization cost
- Whenever best_access_path() encounters the "special JOIN_TAB" it should
consider two strategies:
A. Materialization and making lookups in the materialized table (if applicable)
B. Materialization and then scanning the materialized table.
EXPLAIN
-------
TODO how this will look in EXPLAIN output?
LOW-LEVEL DESIGN:
<contents>
1. Applicability check
2. Representation
2.1 Option #1: Convert to TABLE_LIST
2.1.1 On subquery predicate removal
2.2 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
2.3 Conversion process and its result
3. Pre-optimization steps
3.1 update_ref_and_keys
3.2 Constant detection
4. Join optimization
5. Execution
6. User interface
</contents>
We'll call the new execution strategy "jtbm-materialization", for the lack of
better name.
1. Applicability check
======================
Applicability criteria of jtbm-materialization can be checked at
JOIN::prepare() stage (same as with with semi-join applicability checks)
2. Representation
=================
2.1 Option #1: Convert to TABLE_LIST
------------------------------------
Make it work like semi-join nests: a jtbm-predicate is converted into a
TABLE_LIST object that is injected into select's join tree. This will make it
- uniform with semi-join processing (and we've stepped on all rakes there)
- allow to process JTBM-subqueries in LEFT JOINs' ON expressions
(as TABLE_LIST hierarchy is what is used to encode outer joins)
simplify_joins() will handle jtbm TABLE_LISTs as some kinds of opaque base
tables.
EXPLAIN EXTENDED will display the conversion result as join, that is, for
subquery like
SELECT ... FROM ot WHERE oe IN (SELECT ie FROM materialized-non-sj-select)
it will print
SELECT ... FROM ot, <materialize>(SELECT ie FROM materialized-non-sj-select)
After we've added a TABLE_LIST element, we'll need to remove the subquery
predicate from the WHERE/On clause, in the same way as SJ-converter does it.
TODO: note: we inject a new kind of base table and then next PS re-execution
will re-run name resolution/etc on TABLE_LIST tree that includes this table.
We'll need to make sure that materialized-table doesn't interfere with
anything.
TODO: Separation of steps that need to be done once and steps that need to be
re-done for every PS execution.
2.1.1 On subquery predicate removal
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Q: if we remove the subquery predicate permanently, who will call
fix_fields() for the oe=ie IN-equality? For semi-joins we don't have this
problem as we inject into ON expression
(right? or not? we have sj_on_expr, too...
(Investigation: we the the same Item* pointer both in WHERE and
as sj_on_expr. fix_fields() is called for the WHERE part and that's
how sj_on_expr gets fixed. This works as long as
Item_func_eq::fix_fields() does not try to substitute itself with
another item)
)
A: ?
2.2 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
------------------------------------------------------------
JOIN_TABs only live for the duration of one PS re-execution, so we'll have to
- make conversion fully undoable
- perform it sufficiently late in the optimization process, at the point
where JOIN_TABs are already allocated.
Note that if we don't position JTBM predicates in join's TABLE_LIST tree, then
it will be impossible to handle JTBM queries inside/outside of outer joins.
Note: if we convert that late, we could remove the subquery predicate in the
same way as remove_eq_conds() removes known-to-be-false AND-parts.
This operation is undoable.
2.3 Conversion process and its result
-------------------------------------
At the moment, the following is being implemented:
* convert_join_subqueries_to_semijoins() handles both conversions to
semi-joins (with convert_subq_to_sj()) and to JTBM nests (with
convert_subq_to_jtbm()).
- This means the process is done in bottom-up way (we walk down into
semi-join subselects but not jtbm-selects), with protection against
exhaustion of table bits beyond #MAX_TABLES, etc.
* convert_subq_to_jtbm() invokes
subq_predicate->setup_engine()
which creates the temporary table, and then we use it as the TABLE* object
that we put into TABLE_LIST::table, JOIN_TAB::table, etc.
One of the first actions is that we assign it a bit and store it in
table->map.
(Note that this won't work when we add a choice between Materialization and
IN->EXISTS)
(Another option is do like done in FROM optimization: create a 'preliminary'
TABLE* object which has only some of the fields filled in).
QQ: Do we create/inject IN-equality? why not.. it will be used everywhere
where needed...
3. Pre-optimization steps
=========================
JOIN_TABs are allocated in make_join_statistics(). For JTBM join_tabs we'll
need:
- run subquery's optimizer to get its output cardinality
- todo what else?
3.1 update_ref_and_keys
-----------------------
Since we've injected the equality into WHERE/ON and created a TABLE* with
appropriate index no special steps are necessary. The regular procedure will
add the needed KEYUSE elements.
3.2 Constant detection
----------------------
* We should not let const table detection code to look at the materialized
table, find it to be empty and mark it as constant.
* We also should disallow eq_ref(const)-based detection for materialized table
That is, if outer_expr is found to be constant, the code should not attempt
to make a eq_ref(const) lookup in the materialized table.
TODO {
Second: What about subqueries that "are constant"?
const_item IN (SELECT uncorrelated) -> is constant, but not something
we would want to evaluate at optimization phase.
something IN (SELECT from_constant_join) -> is constant
Do we need to mark their JOIN_TABs as constant?)
}
4. Join optimization
====================
Should just work, with exception that we need to make best_access_path not
to use table->stats.records but use JOIN_TAB::records instead (which already
exists and is filled)
NOTE: FROM-optimization does set table->stats.records. Whose approach is
wrong?
5. Execution
============
We need that:
1. Materialization operation (i.e. execute the subquery and fill the
materialized temptable) is done at start of execution
2. Materialized table is accessed during execution (either by doing lookups or
full table scan)
3. IN-equality is checked at the right moment.
#2 is achieved by putting the materialized table into join_tab->table.
#3 is achieved by injecting IN-equality into the WHERE.
for #1, we need to place a call that fills the temptable at the right location.
6. User interface
=================
No @@optimizer_switch flags are planned at the moment. JTBM-materialization
will be controlled by existing materialization=on|off switch.
We could also add a generic materialization_scan=on|off switch.
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)