← Back to team overview

maria-developers team mailing list archive

Updated (by Psergey): Subqueries: cost-based choice between Materialization and IN->EXISTS transformation (89)

 

-----------------------------------------------------------------------
                              WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Subqueries: cost-based choice between Materialization and IN->EXISTS
		transformation
CREATION DATE..: Sun, 28 Feb 2010, 13:39
SUPERVISOR.....: Monty
IMPLEMENTOR....: 
COPIES TO......: Igor, Psergey, Timour
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 89 (http://askmonty.org/worklog/?tid=89)
VERSION........: Server-5.3
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......:  (hours remain)
ORIG. ESTIMATE.: 0

PROGRESS NOTES:

-=-=(Psergey - Sun, 28 Feb 2010, 15:57)=-=-
High-Level Specification modified.
--- /tmp/wklog.89.old.24045     2010-02-28 15:57:49.000000000 +0000
+++ /tmp/wklog.89.new.24045     2010-02-28 15:57:49.000000000 +0000
@@ -1 +1,38 @@
+Why need two optimizations
+--------------------------
+Consider a query with subquery:
+
+  SELECT 
+    oe IN (SELECT ie FROM inner_tbl WHERE inner_cond)
+  FROM outer_tbl
+  WHERE outer_cond
+
+If we use Materialization strategy, the costs will be 
+
+  cost of accessing outer_tbl + 
+  materialization_cost + 
+  #records(outer_tbl w/o outer_cond) * lookup_cost
+
+where 
+
+  materialization_cost= 
+    cost of executing the (SELECT ie FROM inner_tbl WHERE inner_cond)
+
+On the other hand, for IN->EXISTS strategy, the subquery will be rewritten into
+
+  SELECT 
+    EXISTS (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie)
+  FROM outer_tbl
+  WHERE outer_cond
+
+and the costs will be 
+
+  cost of accessing outer_tbl + 
+  #records(outer_tbl w/o outer_cond) * exists_select_cost
+
+where
+  exists_select_cost= 
+    cost of executing the (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie)
+
+So, we'll need to compute both exists_select_cost and  materialization_cost.
 

-=-=(Psergey - Sun, 28 Feb 2010, 15:07)=-=-
Dependency created: 91 now depends on 89



DESCRIPTION:

For uncorrelated IN subqueries that can't be converted to semi-joins it is 
necessary to make a cost-based choice between IN->EXISTS and Materialization
strategies.

Both strategies handle two cases:
1. A simple case w/o NULLs handling
2. Handling NULLs.

This WL is about making cost-based decision for #1.


HIGH-LEVEL SPECIFICATION:



Why need two optimizations
--------------------------
Consider a query with subquery:

  SELECT 
    oe IN (SELECT ie FROM inner_tbl WHERE inner_cond)
  FROM outer_tbl
  WHERE outer_cond

If we use Materialization strategy, the costs will be 

  cost of accessing outer_tbl + 
  materialization_cost + 
  #records(outer_tbl w/o outer_cond) * lookup_cost

where 

  materialization_cost= 
    cost of executing the (SELECT ie FROM inner_tbl WHERE inner_cond)

On the other hand, for IN->EXISTS strategy, the subquery will be rewritten into

  SELECT 
    EXISTS (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie)
  FROM outer_tbl
  WHERE outer_cond

and the costs will be 

  cost of accessing outer_tbl + 
  #records(outer_tbl w/o outer_cond) * exists_select_cost

where
  exists_select_cost= 
    cost of executing the (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie)

So, we'll need to compute both exists_select_cost and  materialization_cost.


ESTIMATED WORK TIME

ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)