← Back to team overview

maria-developers team mailing list archive

Updated (by Psergey): Subqueries: cost-based choice between Materialization and IN->EXISTS transformation (89)

 

-----------------------------------------------------------------------
                              WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Subqueries: cost-based choice between Materialization and IN->EXISTS
		transformation
CREATION DATE..: Sun, 28 Feb 2010, 13:39
SUPERVISOR.....: Monty
IMPLEMENTOR....: 
COPIES TO......: Igor, Psergey, Timour
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 89 (http://askmonty.org/worklog/?tid=89)
VERSION........: Server-5.3
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......:  (hours remain)
ORIG. ESTIMATE.: 0

PROGRESS NOTES:

-=-=(Psergey - Sun, 28 Feb 2010, 16:08)=-=-
High-Level Specification modified.
--- /tmp/wklog.89.old.24098     2010-02-28 16:08:56.000000000 +0000
+++ /tmp/wklog.89.new.24098     2010-02-28 16:08:56.000000000 +0000
@@ -36,3 +36,14 @@
 
 So, we'll need to compute both exists_select_cost and  materialization_cost.
 
+Difficulty with computing the two costs
+---------------------------------------
+The problem is in this scenario:
+1. We compute materialization_cost by running optimization for the original
+   subquery select.
+2. We compute exists_select_cost by running optimization for the subquery's
+   select with "oe=ie" injected into WHERE
+3. Then we find that cost #1 is less and want to execute the materialization
+   strategy.
+
+

-=-=(Psergey - Sun, 28 Feb 2010, 15:57)=-=-
High-Level Specification modified.
--- /tmp/wklog.89.old.24045     2010-02-28 15:57:49.000000000 +0000
+++ /tmp/wklog.89.new.24045     2010-02-28 15:57:49.000000000 +0000
@@ -1 +1,38 @@
+Why need two optimizations
+--------------------------
+Consider a query with subquery:
+
+  SELECT 
+    oe IN (SELECT ie FROM inner_tbl WHERE inner_cond)
+  FROM outer_tbl
+  WHERE outer_cond
+
+If we use Materialization strategy, the costs will be 
+
+  cost of accessing outer_tbl + 
+  materialization_cost + 
+  #records(outer_tbl w/o outer_cond) * lookup_cost
+
+where 
+
+  materialization_cost= 
+    cost of executing the (SELECT ie FROM inner_tbl WHERE inner_cond)
+
+On the other hand, for IN->EXISTS strategy, the subquery will be rewritten into
+
+  SELECT 
+    EXISTS (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie)
+  FROM outer_tbl
+  WHERE outer_cond
+
+and the costs will be 
+
+  cost of accessing outer_tbl + 
+  #records(outer_tbl w/o outer_cond) * exists_select_cost
+
+where
+  exists_select_cost= 
+    cost of executing the (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie)
+
+So, we'll need to compute both exists_select_cost and  materialization_cost.
 

-=-=(Psergey - Sun, 28 Feb 2010, 15:07)=-=-
Dependency created: 91 now depends on 89



DESCRIPTION:

For uncorrelated IN subqueries that can't be converted to semi-joins it is 
necessary to make a cost-based choice between IN->EXISTS and Materialization
strategies.

Both strategies handle two cases:
1. A simple case w/o NULLs handling
2. Handling NULLs.

This WL is about making cost-based decision for #1.


HIGH-LEVEL SPECIFICATION:



Why need two optimizations
--------------------------
Consider a query with subquery:

  SELECT 
    oe IN (SELECT ie FROM inner_tbl WHERE inner_cond)
  FROM outer_tbl
  WHERE outer_cond

If we use Materialization strategy, the costs will be 

  cost of accessing outer_tbl + 
  materialization_cost + 
  #records(outer_tbl w/o outer_cond) * lookup_cost

where 

  materialization_cost= 
    cost of executing the (SELECT ie FROM inner_tbl WHERE inner_cond)

On the other hand, for IN->EXISTS strategy, the subquery will be rewritten into

  SELECT 
    EXISTS (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie)
  FROM outer_tbl
  WHERE outer_cond

and the costs will be 

  cost of accessing outer_tbl + 
  #records(outer_tbl w/o outer_cond) * exists_select_cost

where
  exists_select_cost= 
    cost of executing the (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie)

So, we'll need to compute both exists_select_cost and materialization_cost.

Difficulty with computing the two costs
---------------------------------------
The problem is in this scenario:
1. We compute materialization_cost by running optimization for the original
   subquery select.
2. We compute exists_select_cost by running optimization for the subquery's
   select with "oe=ie" injected into WHERE
3. Then we find that cost #1 is less and want to execute the materialization
   strategy.



ESTIMATED WORK TIME

ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)