← Back to team overview

maria-developers team mailing list archive

Updated (by Igor): Subqueries: Inside-out execution for non-semijoin materialized subqueries that are AND-parts of the WHERE (90)

 

-----------------------------------------------------------------------
                              WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Subqueries: Inside-out execution for non-semijoin materialized
		subqueries that are AND-parts of the WHERE
CREATION DATE..: Sun, 28 Feb 2010, 13:45
SUPERVISOR.....: Monty
IMPLEMENTOR....: Psergey
COPIES TO......: Igor, Psergey, Timour
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 90 (http://askmonty.org/worklog/?tid=90)
VERSION........: Server-5.3
STATUS.........: Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: -1 (hours remain)
ORIG. ESTIMATE.: 0

PROGRESS NOTES:

-=-=(Igor - Wed, 10 Mar 2010, 22:02)=-=-
High Level Description modified.
--- /tmp/wklog.90.old.2007      2010-03-10 22:02:23.000000000 +0000
+++ /tmp/wklog.90.new.2007      2010-03-10 22:02:23.000000000 +0000
@@ -13,8 +13,8 @@
     for each record R2 in big_table such that oe=R1
       pass R2 to output
 
-Semi-join materialization supports such strategy with SJM-Scan strategy. This WL
-entry is about adding support for such strategies for non-semijoin subqueries.
+Semi-join materialization supports the inside-out strategy. This WL entry is
+about adding support for such strategies for non-semijoin subqueries.
 
 
 Once WL#89 is done, there will be a cost-based choice between 

-=-=(Igor - Wed, 10 Mar 2010, 21:52)=-=-
Status updated.
--- /tmp/wklog.90.old.882       2010-03-10 21:52:02.000000000 +0000
+++ /tmp/wklog.90.new.882       2010-03-10 21:52:02.000000000 +0000
@@ -1 +1 @@
-Un-Assigned
+Assigned

-=-=(Psergey - Sun, 28 Feb 2010, 15:37)=-=-
High Level Description modified.
--- /tmp/wklog.90.old.23524     2010-02-28 15:37:47.000000000 +0000
+++ /tmp/wklog.90.new.23524     2010-02-28 15:37:47.000000000 +0000
@@ -15,3 +15,7 @@
 
 Semi-join materialization supports such strategy with SJM-Scan strategy. This WL
 entry is about adding support for such strategies for non-semijoin subqueries.
+
+
+Once WL#89 is done, there will be a cost-based choice between 
+Materialization+lookup, Materialization+scan, and IN->EXISTS+lookup strategies.

-=-=(Psergey - Sun, 28 Feb 2010, 15:22)=-=-
High-Level Specification modified.
--- /tmp/wklog.90.old.23033     2010-02-28 15:22:09.000000000 +0000
+++ /tmp/wklog.90.new.23033     2010-02-28 15:22:09.000000000 +0000
@@ -1 +1,33 @@
+Basic idea on how this could be achieved:
+
+Pre-optimization phase
+----------------------
+
+The rewrite
+~~~~~~~~~~~
+If we find a subquery predicate that is 
+- not processed by current semi-join optimizations
+- is an AND-part of the WHERE/ON clause
+- can be executed with Materialization
+
+then
+- Remove the predicate from WHERE/ON clause
+- Add a special JOIN_TAB object instead.
+
+Plan options
+~~~~~~~~~~~~
+- Use the IN-equality to create KEYUSE elements.
+
+Optimization
+------------
+- Pre-optimize the subquery so we know materialization cost
+- Whenever best_access_path() encounters the "special JOIN_TAB" it should
+  consider two strategies:
+  A. Materialization and making lookups in the materialized table (if applicable)
+  B. Materialization and then scanning the materialized table.
+
+
+EXPLAIN 
+-------
+TODO how this will look in EXPLAIN output? 
 

-=-=(Psergey - Sun, 28 Feb 2010, 14:56)=-=-
Dependency created: 91 now depends on 90

-=-=(Psergey - Sun, 28 Feb 2010, 14:54)=-=-
Dependency deleted: 94 no longer depends on 90

-=-=(Psergey - Sun, 28 Feb 2010, 14:47)=-=-
Title modified.
--- /tmp/wklog.90.old.21903     2010-02-28 14:47:54.000000000 +0000
+++ /tmp/wklog.90.new.21903     2010-02-28 14:47:54.000000000 +0000
@@ -1 +1 @@
- Subqueries: Inside-out execution for non-semijoin materialized subqueries that are AND-parts of the WHERE
+Subqueries: Inside-out execution for non-semijoin materialized subqueries that are AND-parts of the WHERE

-=-=(Psergey - Sun, 28 Feb 2010, 14:47)=-=-
High Level Description modified.
--- /tmp/wklog.90.old.21880     2010-02-28 14:47:28.000000000 +0000
+++ /tmp/wklog.90.new.21880     2010-02-28 14:47:28.000000000 +0000
@@ -1,10 +1,17 @@
-For uncorrelated IN subqueries that can't be converted to semi-joins it is 
-necessary to make a cost-based choice between IN->EXISTS and Materialization
-strategies.
+Consider the following case:
 
-Both strategies handle two cases:
-1. A simple case w/o NULLs handling
-2. Handling NULLs.
+SELECT * FROM big_table 
+WHERE oe IN (SELECT ie FROM table_with_few_groups
+             WHERE ...
+             GROUP BY group_col) AND ...
 
-This WL is about making cost-based decision for #1.
+Here the best way to execute the query is:
 
+  Materialize the subquery;
+  # now run the join:
+  for each record R1 in materialized table
+    for each record R2 in big_table such that oe=R1
+      pass R2 to output
+
+Semi-join materialization supports such strategy with SJM-Scan strategy. This WL
+entry is about adding support for such strategies for non-semijoin subqueries.

-=-=(Psergey - Sun, 28 Feb 2010, 14:47)=-=-
Title modified.
--- /tmp/wklog.90.old.21859     2010-02-28 14:47:02.000000000 +0000
+++ /tmp/wklog.90.new.21859     2010-02-28 14:47:02.000000000 +0000
@@ -1 +1 @@
-Subqueries: cost-based choice between Materialization and IN->EXISTS transformation
+ Subqueries: Inside-out execution for non-semijoin materialized subqueries that are AND-parts of the WHERE

-=-=(Psergey - Sun, 28 Feb 2010, 14:08)=-=-
Dependency created: 94 now depends on 90



DESCRIPTION:

Consider the following case:

SELECT * FROM big_table 
WHERE oe IN (SELECT ie FROM table_with_few_groups
             WHERE ...
             GROUP BY group_col) AND ...

Here the best way to execute the query is:
  
  Materialize the subquery;
  # now run the join:
  for each record R1 in materialized table
    for each record R2 in big_table such that oe=R1
      pass R2 to output

Semi-join materialization supports the inside-out strategy. This WL entry is
about adding support for such strategies for non-semijoin subqueries.


Once WL#89 is done, there will be a cost-based choice between 
Materialization+lookup, Materialization+scan, and IN->EXISTS+lookup strategies.


HIGH-LEVEL SPECIFICATION:



Basic idea on how this could be achieved:

Pre-optimization phase
----------------------

The rewrite
~~~~~~~~~~~
If we find a subquery predicate that is 
- not processed by current semi-join optimizations
- is an AND-part of the WHERE/ON clause
- can be executed with Materialization

then
- Remove the predicate from WHERE/ON clause
- Add a special JOIN_TAB object instead.

Plan options
~~~~~~~~~~~~
- Use the IN-equality to create KEYUSE elements.

Optimization
------------
- Pre-optimize the subquery so we know materialization cost
- Whenever best_access_path() encounters the "special JOIN_TAB" it should
  consider two strategies:
  A. Materialization and making lookups in the materialized table (if applicable)
  B. Materialization and then scanning the materialized table.


EXPLAIN 
-------
TODO how this will look in EXPLAIN output? 


ESTIMATED WORK TIME

ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)