maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #02376
Updated (by Psergey): Subqueries: cost-based choice between Materialization and IN->EXISTS transformation (89)
-----------------------------------------------------------------------
WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Subqueries: cost-based choice between Materialization and IN->EXISTS
transformation
CREATION DATE..: Sun, 28 Feb 2010, 13:39
SUPERVISOR.....: Monty
IMPLEMENTOR....:
COPIES TO......: Igor, Psergey, Timour
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 89 (http://askmonty.org/worklog/?tid=89)
VERSION........: Server-5.3
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: (hours remain)
ORIG. ESTIMATE.: 0
PROGRESS NOTES:
-=-=(Psergey - Sun, 28 Feb 2010, 16:34)=-=-
High-Level Specification modified.
--- /tmp/wklog.89.old.24497 2010-02-28 16:34:05.000000000 +0000
+++ /tmp/wklog.89.new.24497 2010-02-28 16:34:05.000000000 +0000
@@ -36,8 +36,8 @@
So, we'll need to compute both exists_select_cost and materialization_cost.
-Difficulty with computing the two costs
----------------------------------------
+Difficulty with the need to run select optimization two times
+-------------------------------------------------------------
The problem is in this scenario:
1. We compute materialization_cost by running optimization for the original
subquery select.
@@ -46,4 +46,10 @@
3. Then we find that cost #1 is less and want to execute the materialization
strategy.
+The problem is that once one injects "oe=ie", it can trigger some optimization
+steps that are not possible to undo.
+- Example1: outer->inner join conversion
+- non-Example: according to Igor, "oe=ie" won't participate in equality propagation.
+- ... what else ?
+
-=-=(Psergey - Sun, 28 Feb 2010, 16:08)=-=-
High-Level Specification modified.
--- /tmp/wklog.89.old.24098 2010-02-28 16:08:56.000000000 +0000
+++ /tmp/wklog.89.new.24098 2010-02-28 16:08:56.000000000 +0000
@@ -36,3 +36,14 @@
So, we'll need to compute both exists_select_cost and materialization_cost.
+Difficulty with computing the two costs
+---------------------------------------
+The problem is in this scenario:
+1. We compute materialization_cost by running optimization for the original
+ subquery select.
+2. We compute exists_select_cost by running optimization for the subquery's
+ select with "oe=ie" injected into WHERE
+3. Then we find that cost #1 is less and want to execute the materialization
+ strategy.
+
+
-=-=(Psergey - Sun, 28 Feb 2010, 15:57)=-=-
High-Level Specification modified.
--- /tmp/wklog.89.old.24045 2010-02-28 15:57:49.000000000 +0000
+++ /tmp/wklog.89.new.24045 2010-02-28 15:57:49.000000000 +0000
@@ -1 +1,38 @@
+Why need two optimizations
+--------------------------
+Consider a query with subquery:
+
+ SELECT
+ oe IN (SELECT ie FROM inner_tbl WHERE inner_cond)
+ FROM outer_tbl
+ WHERE outer_cond
+
+If we use Materialization strategy, the costs will be
+
+ cost of accessing outer_tbl +
+ materialization_cost +
+ #records(outer_tbl w/o outer_cond) * lookup_cost
+
+where
+
+ materialization_cost=
+ cost of executing the (SELECT ie FROM inner_tbl WHERE inner_cond)
+
+On the other hand, for IN->EXISTS strategy, the subquery will be rewritten into
+
+ SELECT
+ EXISTS (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie)
+ FROM outer_tbl
+ WHERE outer_cond
+
+and the costs will be
+
+ cost of accessing outer_tbl +
+ #records(outer_tbl w/o outer_cond) * exists_select_cost
+
+where
+ exists_select_cost=
+ cost of executing the (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie)
+
+So, we'll need to compute both exists_select_cost and materialization_cost.
-=-=(Psergey - Sun, 28 Feb 2010, 15:07)=-=-
Dependency created: 91 now depends on 89
DESCRIPTION:
For uncorrelated IN subqueries that can't be converted to semi-joins it is
necessary to make a cost-based choice between IN->EXISTS and Materialization
strategies.
Both strategies handle two cases:
1. A simple case w/o NULLs handling
2. Handling NULLs.
This WL is about making cost-based decision for #1.
HIGH-LEVEL SPECIFICATION:
Why need two optimizations
--------------------------
Consider a query with subquery:
SELECT
oe IN (SELECT ie FROM inner_tbl WHERE inner_cond)
FROM outer_tbl
WHERE outer_cond
If we use Materialization strategy, the costs will be
cost of accessing outer_tbl +
materialization_cost +
#records(outer_tbl w/o outer_cond) * lookup_cost
where
materialization_cost=
cost of executing the (SELECT ie FROM inner_tbl WHERE inner_cond)
On the other hand, for IN->EXISTS strategy, the subquery will be rewritten into
SELECT
EXISTS (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie)
FROM outer_tbl
WHERE outer_cond
and the costs will be
cost of accessing outer_tbl +
#records(outer_tbl w/o outer_cond) * exists_select_cost
where
exists_select_cost=
cost of executing the (SELECT 1 FROM inner_tbl WHERE inner_cond AND oe=ie)
So, we'll need to compute both exists_select_cost and materialization_cost.
Difficulty with the need to run select optimization two times
-------------------------------------------------------------
The problem is in this scenario:
1. We compute materialization_cost by running optimization for the original
subquery select.
2. We compute exists_select_cost by running optimization for the subquery's
select with "oe=ie" injected into WHERE
3. Then we find that cost #1 is less and want to execute the materialization
strategy.
The problem is that once one injects "oe=ie", it can trigger some optimization
steps that are not possible to undo.
- Example1: outer->inner join conversion
- non-Example: according to Igor, "oe=ie" won't participate in equality propagation.
- ... what else ?
ESTIMATED WORK TIME
ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)