← Back to team overview

maria-developers team mailing list archive

Updated (by Timour): Subqueries: cost-based choice between Materialization and IN->EXISTS transformation (89)

 

-----------------------------------------------------------------------
                              WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Subqueries: cost-based choice between Materialization and IN->EXISTS
		transformation
CREATION DATE..: Sun, 28 Feb 2010, 13:39
SUPERVISOR.....: Monty
IMPLEMENTOR....: Timour
COPIES TO......: Igor, Psergey, Timour
CATEGORY.......: Server-Sprint
TASK ID........: 89 (http://askmonty.org/worklog/?tid=89)
VERSION........: Server-5.3
STATUS.........: In-Progress
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......:  (hours remain)
ORIG. ESTIMATE.: 0

PROGRESS NOTES:

-=-=(Timour - Fri, 12 Mar 2010, 09:17)=-=-
Status updated.
--- /tmp/wklog.89.old.13018     2010-03-12 09:17:25.000000000 +0000
+++ /tmp/wklog.89.new.13018     2010-03-12 09:17:25.000000000 +0000
@@ -1 +1 @@
-Assigned
+In-Progress

-=-=(Igor - Wed, 10 Mar 2010, 21:48)=-=-
Category updated.
--- /tmp/wklog.89.old.778       2010-03-10 21:48:08.000000000 +0000
+++ /tmp/wklog.89.new.778       2010-03-10 21:48:08.000000000 +0000
@@ -1 +1 @@
-Server-RawIdeaBin
+Server-Sprint

-=-=(Igor - Wed, 10 Mar 2010, 21:48)=-=-
Status updated.
--- /tmp/wklog.89.old.778       2010-03-10 21:48:08.000000000 +0000
+++ /tmp/wklog.89.new.778       2010-03-10 21:48:08.000000000 +0000
@@ -1 +1 @@
-Un-Assigned
+Assigned

-=-=(Psergey - Sun, 28 Feb 2010, 16:34)=-=-
High-Level Specification modified.
--- /tmp/wklog.89.old.24497     2010-02-28 16:34:05.000000000 +0000
+++ /tmp/wklog.89.new.24497     2010-02-28 16:34:05.000000000 +0000
@@ -36,8 +36,8 @@
 
 So, we'll need to compute both exists_select_cost and materialization_cost.
 
-Difficulty with computing the two costs
----------------------------------------
+Difficulty with the need to run select optimization two times
+-------------------------------------------------------------
 The problem is in this scenario:
 1. We compute materialization_cost by running optimization for the original
    subquery select.
@@ -46,4 +46,10 @@
 3. Then we find that cost #1 is less and want to execute the materialization
    strategy.
 
+The problem is that once one injects "oe=ie", it can trigger some optimization
+steps that are not possible to undo.
+- Example1:  outer->inner join conversion
+- non-Example: according to Igor, "oe=ie" won't participate in equality propagation.
+- ... what else ?
+
 

-=-=(Psergey - Sun, 28 Feb 2010, 16:08)=-=-
High-Level Specification modified.
--- /tmp/wklog.89.old.24098     2010-02-28 16:08:56.000000000 +0000
+++ /tmp/wklog.89.new.24098     2010-02-28 16:08:56.000000000 +0000
@@ -36,3 +36,14 @@
 
 So, we'll need to compute both exists_select_cost and  materialization_cost.
 
+Difficulty with computing the two costs
+---------------------------------------
+The problem is in this scenario:
+1. We compute materialization_cost by running optimization for the original
+   subquery select.
+2. We compute exists_select_cost by running optimization for the subquery's
+   select with "oe=ie" injected into WHERE
+3. Then we find that cost #1 is less and want to execute the materialization
+   strategy.
+
+

-=-=(Psergey - Sun, 28 Feb 2010, 15:57)=-=-
High-Level Specification modified.
--- /tmp/wklog.89.old.24045     2010-02-28 15:57:49.000000000 +0000
+++ /tmp/wklog.89.new.24045     2010-02-28 15:57:49.000000000 +0000
@@ -1 +1,38 @@
+Why need two optimizations
+--------------------------
+Consider a query with subquery:
+
+  SELECT 
+    oe IN (SELECT ie FROM inner_tbl WHERE inner_cond)
+  FROM outer_tbl
+  WHERE outer_cond
+
+If we use Materialization strategy, the costs will be 
+
+  cost of accessing outer_tbl + 
+  materialization_cost + 
+  #records(outer_tbl w/o outer_cond) * lookup_cost
+
+where 
+
+  materialization_cost= 
+    cost of executing the (SELECT ie FROM inner_tbl WHERE inner_cond)
+
+On the other hand, for IN->EXISTS strategy, the subquery will be rewritten into
+
+  SELECT 
+    EXISTS (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie)
+  FROM outer_tbl
+  WHERE outer_cond
+
+and the costs will be 
+
+  cost of accessing outer_tbl + 
+  #records(outer_tbl w/o outer_cond) * exists_select_cost
+
+where
+  exists_select_cost= 
+    cost of executing the (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie)
+
+So, we'll need to compute both exists_select_cost and  materialization_cost.
 

-=-=(Psergey - Sun, 28 Feb 2010, 15:07)=-=-
Dependency created: 91 now depends on 89



DESCRIPTION:

For uncorrelated IN subqueries that can't be converted to semi-joins it is 
necessary to make a cost-based choice between IN->EXISTS and Materialization
strategies.

Both strategies handle two cases:
1. A simple case w/o NULLs handling
2. Handling NULLs.

This WL is about making cost-based decision for #1.


HIGH-LEVEL SPECIFICATION:



Why need two optimizations
--------------------------
Consider a query with subquery:

  SELECT 
    oe IN (SELECT ie FROM inner_tbl WHERE inner_cond)
  FROM outer_tbl
  WHERE outer_cond

If we use Materialization strategy, the costs will be 

  cost of accessing outer_tbl + 
  materialization_cost + 
  #records(outer_tbl w/o outer_cond) * lookup_cost

where 

  materialization_cost= 
    cost of executing the (SELECT ie FROM inner_tbl WHERE inner_cond)

On the other hand, for IN->EXISTS strategy, the subquery will be rewritten into

  SELECT 
    EXISTS (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie)
  FROM outer_tbl
  WHERE outer_cond

and the costs will be 

  cost of accessing outer_tbl + 
  #records(outer_tbl w/o outer_cond) * exists_select_cost

where
  exists_select_cost= 
    cost of executing the (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie)

So, we'll need to compute both exists_select_cost and materialization_cost.

Difficulty with the need to run select optimization two times
-------------------------------------------------------------
The problem is in this scenario:
1. We compute materialization_cost by running optimization for the original
   subquery select.
2. We compute exists_select_cost by running optimization for the subquery's
   select with "oe=ie" injected into WHERE
3. Then we find that cost #1 is less and want to execute the materialization
   strategy.

The problem is that once one injects "oe=ie", it can trigger some optimization
steps that are not possible to undo.
- Example1:  outer->inner join conversion
- non-Example: according to Igor, "oe=ie" won't participate in equality propagation.
- ... what else ?



ESTIMATED WORK TIME

ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)