← Back to team overview

maria-developers team mailing list archive

Updated (by Psergey): Subqueries: Inside-out execution for non-semijoin materialized subqueries that are AND-parts of the WHERE (90)

 

-----------------------------------------------------------------------
                              WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Subqueries: Inside-out execution for non-semijoin materialized
		subqueries that are AND-parts of the WHERE
CREATION DATE..: Sun, 28 Feb 2010, 13:45
SUPERVISOR.....: Monty
IMPLEMENTOR....: Psergey
COPIES TO......: Igor, Psergey, Timour
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 90 (http://askmonty.org/worklog/?tid=90)
VERSION........: Server-5.3
STATUS.........: Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: -1 (hours remain)
ORIG. ESTIMATE.: 0

PROGRESS NOTES:

-=-=(Psergey - Wed, 24 Mar 2010, 14:42)=-=-
Low Level Design modified.
--- /tmp/wklog.90.old.19182     2010-03-24 14:42:54.000000000 +0000
+++ /tmp/wklog.90.new.19182     2010-03-24 14:42:54.000000000 +0000
@@ -1 +1,140 @@
+<contents>
+1. Applicability check
+2. Representation
+2.1 Option #1: Convert to TABLE_LIST
+2.2 On subquery predicate removal
+2.3 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
+2.3 What is expected of the result of conversion
+3. Pre-optimization steps
+3.1 Constant detection
+3.3 update_ref_and_keys
+3.4 JOIN_TAB sorting criteria
+4. Optimization
+5. Execution
+User interface.
+</contents>
+
+We'll call the new execution strategy "jtbm-materialization", for the lack of
+better name.
+
+1. Applicability check
+======================
+The criteria for checking whether a subquery can be processed with
+jtbm-materialization can be checked at JOIN::prepare stage (like it
+happens with semi-join check)
+
+2. Representation
+=================
+
+2.1 Option #1: Convert to TABLE_LIST
+------------------------------------
+Make it work like semi-join nests: each jtbm-predicate is converted into a
+TABLE_LIST object. This will make it
+
+ - uniform with semi-joins (we've stepped on all rakes there)
+ - allow to process JTBM-subqueries in ON expressions
+
+simplify_joins() will handle jtbm TABLE_LISTs as some kinds of opaque base
+tables.
+
+for EXPLAIN EXTENDED, it would be natural to print something semi-join like,
+i.e. for
+
+  SELECT ... FROM ot WHERE oe IN (SELECT ie FROM materialized-non-sj-select)
+
+we'll print
+
+  SELECT ... FROM ot SJ (SELECT ie FROM materialized-non-sj-select) ON oe=XX
+
+the XX part is not clear. we don't want to print 'ie' the second time here?
+
+2.2 On subquery predicate removal
+---------------------------------
+Q: if we remove the subquery predicate permanently, who will run
+fix_fields() for it? For semi-joins we don't have the problem as we
+inject into ON expression (right? or not? we have sj_on_expr, too...
+(Investigation: we the the same Item* pointer both in WHERE and
+as sj_on_expr. fix_fields() is called for the WHERE part and that's 
+how sj_on_expr gets fixed. This works as long as 
+Item_func_eq::fix_fields() does not try to substitute itself with
+another item).
+A: ?
+
+2.3 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
+------------------------------------------------------------
+JOIN_TABs only live for the duration of one PS re-execution, so we'll have to 
+- make conversion fully undoable
+- perform it sufficiently late in the optimization process, at the point 
+  where JOIN_TABs are already allocated.
+
+Note that if we don't position JTBM predicates in join's TABLE_LIST tree, then 
+it will be impossible to handle JTBM queries inside/outside of outer joins.
+
+2.3 What is expected of the result of conversion
+------------------------------------------------
+Join [pre]optimization relies on each optimized entity to have a bit in 
+table_map.
+
+TODO: where do we check if there will be enough bits for everyone? (at the 
+      point where we assign them?)
+
+The bit stored in join_tab->table->map, and the apparent problem is that JTBM
+join_tabs do not naturally have TABLE* object.
+
+We could use the the one that will be used for Materialization, but that will
+stop working when we will have to include IN->EXISTS in the choice.
+
+Current approach: don't create a table. create a table_map element in JOIN_TAB 
+instead. Evgen has probably done something like that already.
+
+3. Pre-optimization steps
+=========================
+JOIN_TABs are allocated in make_join_statistics(). This where the changes will
+be needed: for JOIN_TABs that correspond to JTBM-tables:
+
+- don't set tab->table, set tab->jtbm_select (or whatever)
+- run subquery's optimizer to get its output cardinality
+
+3.1 Constant detection
+----------------------
+What about subqueries that "are constant"?
+  const_item IN (SELECT uncorrelated) ->  is constant, but not something 
+    we would want to evaluate.
+  something IN (SELECT from_constant_join) -> is constant
+
+Do we need to mark their JOIN_TABs as constant?
+
+3.3 update_ref_and_keys
+-----------------------
+* Walk through JTBM elements and inject KEYUSE elements for their
+  IN-equalities.
+
+TODO: KEYUSE elements imply presense of KEYs! Which we don't have!
+
+3.4 JOIN_TAB sorting criteria
+-----------------------------
+Q: Where do we put JTBM's join_tab when pre-sorting records? 
+A: it should sort as regular table.
+
+TODO: where do we remove the predicates from the WHERE?
+ - remove them like SJ-converter does 
+ - remove them with optimizer (like remove_eq_conds does)
+
+4. Optimization
+===============
+Add a branch in best_access_path to account for
+- JTBM-Materialization 
+- JTBM-Materialization-Scan.
+
+5. Execution
+============
+* We should be able to reuse item_subselect.cc code for lookups
+* But will have to use our own temptable scan code 
+
+TODO: is it possible to have any unification with SJ-Materialization?
+
+User interface
+--------------
+Any @@optimizer_switch flags for all this?
+
 

-=-=(Igor - Wed, 10 Mar 2010, 22:02)=-=-
High Level Description modified.
--- /tmp/wklog.90.old.2007      2010-03-10 22:02:23.000000000 +0000
+++ /tmp/wklog.90.new.2007      2010-03-10 22:02:23.000000000 +0000
@@ -13,8 +13,8 @@
     for each record R2 in big_table such that oe=R1
       pass R2 to output
 
-Semi-join materialization supports such strategy with SJM-Scan strategy. This WL
-entry is about adding support for such strategies for non-semijoin subqueries.
+Semi-join materialization supports the inside-out strategy. This WL entry is
+about adding support for such strategies for non-semijoin subqueries.
 
 
 Once WL#89 is done, there will be a cost-based choice between 

-=-=(Igor - Wed, 10 Mar 2010, 21:52)=-=-
Status updated.
--- /tmp/wklog.90.old.882       2010-03-10 21:52:02.000000000 +0000
+++ /tmp/wklog.90.new.882       2010-03-10 21:52:02.000000000 +0000
@@ -1 +1 @@
-Un-Assigned
+Assigned

-=-=(Psergey - Sun, 28 Feb 2010, 15:37)=-=-
High Level Description modified.
--- /tmp/wklog.90.old.23524     2010-02-28 15:37:47.000000000 +0000
+++ /tmp/wklog.90.new.23524     2010-02-28 15:37:47.000000000 +0000
@@ -15,3 +15,7 @@
 
 Semi-join materialization supports such strategy with SJM-Scan strategy. This WL
 entry is about adding support for such strategies for non-semijoin subqueries.
+
+
+Once WL#89 is done, there will be a cost-based choice between 
+Materialization+lookup, Materialization+scan, and IN->EXISTS+lookup strategies.

-=-=(Psergey - Sun, 28 Feb 2010, 15:22)=-=-
High-Level Specification modified.
--- /tmp/wklog.90.old.23033     2010-02-28 15:22:09.000000000 +0000
+++ /tmp/wklog.90.new.23033     2010-02-28 15:22:09.000000000 +0000
@@ -1 +1,33 @@
+Basic idea on how this could be achieved:
+
+Pre-optimization phase
+----------------------
+
+The rewrite
+~~~~~~~~~~~
+If we find a subquery predicate that is 
+- not processed by current semi-join optimizations
+- is an AND-part of the WHERE/ON clause
+- can be executed with Materialization
+
+then
+- Remove the predicate from WHERE/ON clause
+- Add a special JOIN_TAB object instead.
+
+Plan options
+~~~~~~~~~~~~
+- Use the IN-equality to create KEYUSE elements.
+
+Optimization
+------------
+- Pre-optimize the subquery so we know materialization cost
+- Whenever best_access_path() encounters the "special JOIN_TAB" it should
+  consider two strategies:
+  A. Materialization and making lookups in the materialized table (if applicable)
+  B. Materialization and then scanning the materialized table.
+
+
+EXPLAIN 
+-------
+TODO how this will look in EXPLAIN output? 
 

-=-=(Psergey - Sun, 28 Feb 2010, 14:56)=-=-
Dependency created: 91 now depends on 90

-=-=(Psergey - Sun, 28 Feb 2010, 14:54)=-=-
Dependency deleted: 94 no longer depends on 90

-=-=(Psergey - Sun, 28 Feb 2010, 14:47)=-=-
Title modified.
--- /tmp/wklog.90.old.21903     2010-02-28 14:47:54.000000000 +0000
+++ /tmp/wklog.90.new.21903     2010-02-28 14:47:54.000000000 +0000
@@ -1 +1 @@
- Subqueries: Inside-out execution for non-semijoin materialized subqueries that are AND-parts of the WHERE
+Subqueries: Inside-out execution for non-semijoin materialized subqueries that are AND-parts of the WHERE

-=-=(Psergey - Sun, 28 Feb 2010, 14:47)=-=-
High Level Description modified.
--- /tmp/wklog.90.old.21880     2010-02-28 14:47:28.000000000 +0000
+++ /tmp/wklog.90.new.21880     2010-02-28 14:47:28.000000000 +0000
@@ -1,10 +1,17 @@
-For uncorrelated IN subqueries that can't be converted to semi-joins it is 
-necessary to make a cost-based choice between IN->EXISTS and Materialization
-strategies.
+Consider the following case:
 
-Both strategies handle two cases:
-1. A simple case w/o NULLs handling
-2. Handling NULLs.
+SELECT * FROM big_table 
+WHERE oe IN (SELECT ie FROM table_with_few_groups
+             WHERE ...
+             GROUP BY group_col) AND ...
 
-This WL is about making cost-based decision for #1.
+Here the best way to execute the query is:
 
+  Materialize the subquery;
+  # now run the join:
+  for each record R1 in materialized table
+    for each record R2 in big_table such that oe=R1
+      pass R2 to output
+
+Semi-join materialization supports such strategy with SJM-Scan strategy. This WL
+entry is about adding support for such strategies for non-semijoin subqueries.

-=-=(Psergey - Sun, 28 Feb 2010, 14:47)=-=-
Title modified.
--- /tmp/wklog.90.old.21859     2010-02-28 14:47:02.000000000 +0000
+++ /tmp/wklog.90.new.21859     2010-02-28 14:47:02.000000000 +0000
@@ -1 +1 @@
-Subqueries: cost-based choice between Materialization and IN->EXISTS transformation
+ Subqueries: Inside-out execution for non-semijoin materialized subqueries that are AND-parts of the WHERE

	------------------------------------------------------------

		-=-=(View All Progress Notes, 11 total)=-=-
	http://askmonty.org/worklog/index.pl?tid=90&nolimit=1


DESCRIPTION:

Consider the following case:

SELECT * FROM big_table 
WHERE oe IN (SELECT ie FROM table_with_few_groups
             WHERE ...
             GROUP BY group_col) AND ...

Here the best way to execute the query is:
  
  Materialize the subquery;
  # now run the join:
  for each record R1 in materialized table
    for each record R2 in big_table such that oe=R1
      pass R2 to output

Semi-join materialization supports the inside-out strategy. This WL entry is
about adding support for such strategies for non-semijoin subqueries.


Once WL#89 is done, there will be a cost-based choice between 
Materialization+lookup, Materialization+scan, and IN->EXISTS+lookup strategies.


HIGH-LEVEL SPECIFICATION:



Basic idea on how this could be achieved:

Pre-optimization phase
----------------------

The rewrite
~~~~~~~~~~~
If we find a subquery predicate that is 
- not processed by current semi-join optimizations
- is an AND-part of the WHERE/ON clause
- can be executed with Materialization

then
- Remove the predicate from WHERE/ON clause
- Add a special JOIN_TAB object instead.

Plan options
~~~~~~~~~~~~
- Use the IN-equality to create KEYUSE elements.

Optimization
------------
- Pre-optimize the subquery so we know materialization cost
- Whenever best_access_path() encounters the "special JOIN_TAB" it should
  consider two strategies:
  A. Materialization and making lookups in the materialized table (if applicable)
  B. Materialization and then scanning the materialized table.


EXPLAIN 
-------
TODO how this will look in EXPLAIN output? 


LOW-LEVEL DESIGN:



<contents>
1. Applicability check
2. Representation
2.1 Option #1: Convert to TABLE_LIST
2.2 On subquery predicate removal
2.3 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
2.3 What is expected of the result of conversion
3. Pre-optimization steps
3.1 Constant detection
3.3 update_ref_and_keys
3.4 JOIN_TAB sorting criteria
4. Optimization
5. Execution
User interface.
</contents>

We'll call the new execution strategy "jtbm-materialization", for the lack of
better name.

1. Applicability check
======================
The criteria for checking whether a subquery can be processed with
jtbm-materialization can be checked at JOIN::prepare stage (like it
happens with semi-join check)

2. Representation
=================

2.1 Option #1: Convert to TABLE_LIST
------------------------------------
Make it work like semi-join nests: each jtbm-predicate is converted into a
TABLE_LIST object. This will make it

 - uniform with semi-joins (we've stepped on all rakes there)
 - allow to process JTBM-subqueries in ON expressions

simplify_joins() will handle jtbm TABLE_LISTs as some kinds of opaque base
tables.

for EXPLAIN EXTENDED, it would be natural to print something semi-join like,
i.e. for

  SELECT ... FROM ot WHERE oe IN (SELECT ie FROM materialized-non-sj-select)

we'll print

  SELECT ... FROM ot SJ (SELECT ie FROM materialized-non-sj-select) ON oe=XX

the XX part is not clear. we don't want to print 'ie' the second time here?

2.2 On subquery predicate removal
---------------------------------
Q: if we remove the subquery predicate permanently, who will run
fix_fields() for it? For semi-joins we don't have the problem as we
inject into ON expression (right? or not? we have sj_on_expr, too...
(Investigation: we the the same Item* pointer both in WHERE and
as sj_on_expr. fix_fields() is called for the WHERE part and that's 
how sj_on_expr gets fixed. This works as long as 
Item_func_eq::fix_fields() does not try to substitute itself with
another item).
A: ?

2.3 Option #2: No TABLE_LIST, convert to JOIN_TAB right away
------------------------------------------------------------
JOIN_TABs only live for the duration of one PS re-execution, so we'll have to 
- make conversion fully undoable
- perform it sufficiently late in the optimization process, at the point 
  where JOIN_TABs are already allocated.

Note that if we don't position JTBM predicates in join's TABLE_LIST tree, then 
it will be impossible to handle JTBM queries inside/outside of outer joins.

2.3 What is expected of the result of conversion
------------------------------------------------
Join [pre]optimization relies on each optimized entity to have a bit in 
table_map.

TODO: where do we check if there will be enough bits for everyone? (at the 
      point where we assign them?)

The bit stored in join_tab->table->map, and the apparent problem is that JTBM
join_tabs do not naturally have TABLE* object.

We could use the the one that will be used for Materialization, but that will
stop working when we will have to include IN->EXISTS in the choice.

Current approach: don't create a table. create a table_map element in JOIN_TAB 
instead. Evgen has probably done something like that already.

3. Pre-optimization steps
=========================
JOIN_TABs are allocated in make_join_statistics(). This where the changes will
be needed: for JOIN_TABs that correspond to JTBM-tables:

- don't set tab->table, set tab->jtbm_select (or whatever)
- run subquery's optimizer to get its output cardinality

3.1 Constant detection
----------------------
What about subqueries that "are constant"?
  const_item IN (SELECT uncorrelated) ->  is constant, but not something 
    we would want to evaluate.
  something IN (SELECT from_constant_join) -> is constant

Do we need to mark their JOIN_TABs as constant?

3.3 update_ref_and_keys
-----------------------
* Walk through JTBM elements and inject KEYUSE elements for their
  IN-equalities.

TODO: KEYUSE elements imply presense of KEYs! Which we don't have!

3.4 JOIN_TAB sorting criteria
-----------------------------
Q: Where do we put JTBM's join_tab when pre-sorting records? 
A: it should sort as regular table.

TODO: where do we remove the predicates from the WHERE?
 - remove them like SJ-converter does 
 - remove them with optimizer (like remove_eq_conds does)

4. Optimization
===============
Add a branch in best_access_path to account for
- JTBM-Materialization 
- JTBM-Materialization-Scan.

5. Execution
============
* We should be able to reuse item_subselect.cc code for lookups
* But will have to use our own temptable scan code 

TODO: is it possible to have any unification with SJ-Materialization?

User interface
--------------
Any @@optimizer_switch flags for all this?



ESTIMATED WORK TIME

ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)