← Back to team overview

maria-developers team mailing list archive

Updated (by Psergey): Push conditions down into non-mergeable VIEWs when possible (119)

 

-----------------------------------------------------------------------
                              WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Push conditions down into non-mergeable VIEWs when possible
CREATION DATE..: Mon, 24 May 2010, 20:52
SUPERVISOR.....: Igor
IMPLEMENTOR....: 
COPIES TO......: Psergey, Timour
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 119 (http://askmonty.org/worklog/?tid=119)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0

PROGRESS NOTES:

-=-=(Psergey - Mon, 24 May 2010, 20:59)=-=-
High-Level Specification modified.
--- /tmp/wklog.119.old.25116    2010-05-24 20:59:40.000000000 +0000
+++ /tmp/wklog.119.new.25116    2010-05-24 20:59:40.000000000 +0000
@@ -1 +1,113 @@
+<contents>
+HLS
+1. Problems to be addressed in this WL
+2. Pushdown of conditions into non-mergeable VIEWs
+2.1 A note about VIEWs with ORDER BY ... LIMIT
+2.2 What condition can be pushed 
+3. Pushdown from HAVING into WHERE
+4. When to do the pushdown
+5. Other things to take care of
+
+
+</contents>
+
+1. Problems to be addressed in this WL
+======================================
+The problem actually consists of two parts:
+1. Condition on VIEW columns are not pushed down into VIEWs.
+2. Even if conditions were pushed, they would have been put into VIEW's 
+HAVING clause, which would not give enough of speedup. In order to get a
+real speedup, the optimizer must be able to move relevant part of HAVING
+into WHERE (and then use it for further optimizations) in order to provide
+the desired speedup. Note that HAVING->WHERE condition move is orthogonal
+to VIEW processing.
+
+2. Pushdown of conditions into non-mergeable VIEWs
+==================================================
+We can push a condition into non-mergeable VIEW when VIEW's top-most operation
+is selection (i.e., filtering). This is true, for example, when the VIEW is
+defined as 
+
+  SELECT select_list FROM from_clause [WHERE where_cond] [HAVING having_cond]
+
+and not true when the VIEW is defined as
+
+  SELECT select_list FROM from_clause [WHERE where_cond] ORDER BY expr LIMIT n
+
+Generalizing the above, we arrive at the following rule:
+
+  For non-mergeable VIEWs, 
+   - pushdown must not be done if VIEW uses ORDER BY .. LIMIT
+   - when pushdown is done, the pushed condition should be added to the WHERE
+     clause.
+
+Note: In scope of this WL, we will not hande VIEWs that have UNION [ALL] as
+top operation.
+
+(TODO: what about SELECT DISTINCT?)
+(TODO: pushdown down into IN subqueries?)
+
+2.1 A note about VIEWs with ORDER BY ... LIMIT
+----------------------------------------------
+Although it is not possible to push a condition below the ORDER BY ... LIMIT
+operation, there is still some benefit from checking the condition early as
+that would allow to avoid writing non-matching rows into temporary table.
+
+We could do that if we introduced a post-ORDERBY selection operation. That
+operation would also allow to support ORDER BY ... LIMIT inside subqueries
+(we don't currently support those because default subquery strategy,
+IN->EXISTS rewrite, also needs to push a condition into subquery).
+
+2.2 What condition can be pushed 
+--------------------------------
+Assuming simplify_joins() operation has done normalization:
+* If the VIEW is in top-level join list, or inside a semi-join that's in
+  top-level join list, we can push parts of WHERE condition.
+* If the VIEW is inside an outer join, we can push it's ON expression.
+
+We can reuse make_cond_for_index()/make_remainder_cond() code to extract part
+of condition that can be pushed, and the remainder, respectively.
+
+Pushability criteria for an atomic (i.e. not AND/OR) condition is that
+   
+  the condition only uses VIEW's fields.
+
+(TODO: what about fields of const tables? Do we have const tables already
+retrived by the time VIEW is materialized? If yes, we could push down 
+expressions that refer to const tables, too)
+
+3. Pushdown from HAVING into WHERE
+==================================
+The idea is:
+
+  Parts of HAVING that refer to columns/expressions we're doing GROUP BY on
+  can be put into WHERE.
+
+(TODO: do we need to handle case of grouping over expressions?)
+
+(TODO: when moving expression for HAVING to WHERE, do we need 
+to do something with it? Replace all Item_ref objects with items that
+they refer to? 
+  - In case of referring to expression, do we get 
+    Item_ref(where_clause_expr) or expr( Item_ref(..), .., Item_ref(..))? 
+)
+
+4. When to do the pushdown
+==========================
+In order to do pushdown, we must have prepare phase finished 
+for both parent (so that we can make sense of its WHERE condition) and 
+child (so that we know what it has in its select list).
+
+We can do pushdown before we've done join optimization (i.e. choose_plan()
+call) of the parent.
+
+We must do pushdown before we've done JOIN::optimize() of the child
+(in particular, it must be done before we do update_ref_and_keys() and 
+range analysis in the child).
+
+
+5. Other things to take care of
+===============================
+* Pushing down fulltext predicates (it seems one needs to "register" a 
+  fulltext predicate when it is moved from one select from another? Ask Serg)
 



DESCRIPTION:

There are complaints (see links below) about cases with non-mergeable 
VIEW (because it has a GROUP BY), a query that has restrictions on 
the grouped column, and poor performance that is caused by VIEW 
processing code ignoring the restriction.

This WL is about addressing this issue.

links to complaints:
http://code.openark.org/blog/mysql/views-better-performance-with-condition-pushdown
http://www.mysqlperformanceblog.com/2010/05/19/a-workaround-for-the-performance-problems-of-temptable-views/

The target version is MariaDB 5.3+, because it has late optimization/execution
for FROM-subqueries/non mergeable VIEWs, which makes it much more feasible to
inject something into VIEW before it is optimized/executed.


HIGH-LEVEL SPECIFICATION:



<contents>
HLS
1. Problems to be addressed in this WL
2. Pushdown of conditions into non-mergeable VIEWs
2.1 A note about VIEWs with ORDER BY ... LIMIT
2.2 What condition can be pushed 
3. Pushdown from HAVING into WHERE
4. When to do the pushdown
5. Other things to take care of


</contents>

1. Problems to be addressed in this WL
======================================
The problem actually consists of two parts:
1. Condition on VIEW columns are not pushed down into VIEWs.
2. Even if conditions were pushed, they would have been put into VIEW's 
HAVING clause, which would not give enough of speedup. In order to get a
real speedup, the optimizer must be able to move relevant part of HAVING
into WHERE (and then use it for further optimizations) in order to provide
the desired speedup. Note that HAVING->WHERE condition move is orthogonal
to VIEW processing.

2. Pushdown of conditions into non-mergeable VIEWs
==================================================
We can push a condition into non-mergeable VIEW when VIEW's top-most operation
is selection (i.e., filtering). This is true, for example, when the VIEW is
defined as 

  SELECT select_list FROM from_clause [WHERE where_cond] [HAVING having_cond]

and not true when the VIEW is defined as

  SELECT select_list FROM from_clause [WHERE where_cond] ORDER BY expr LIMIT n

Generalizing the above, we arrive at the following rule:

  For non-mergeable VIEWs, 
   - pushdown must not be done if VIEW uses ORDER BY .. LIMIT
   - when pushdown is done, the pushed condition should be added to the WHERE
     clause.

Note: In scope of this WL, we will not hande VIEWs that have UNION [ALL] as
top operation.

(TODO: what about SELECT DISTINCT?)
(TODO: pushdown down into IN subqueries?)

2.1 A note about VIEWs with ORDER BY ... LIMIT
----------------------------------------------
Although it is not possible to push a condition below the ORDER BY ... LIMIT
operation, there is still some benefit from checking the condition early as
that would allow to avoid writing non-matching rows into temporary table.

We could do that if we introduced a post-ORDERBY selection operation. That
operation would also allow to support ORDER BY ... LIMIT inside subqueries
(we don't currently support those because default subquery strategy,
IN->EXISTS rewrite, also needs to push a condition into subquery).

2.2 What condition can be pushed 
--------------------------------
Assuming simplify_joins() operation has done normalization:
* If the VIEW is in top-level join list, or inside a semi-join that's in
  top-level join list, we can push parts of WHERE condition.
* If the VIEW is inside an outer join, we can push it's ON expression.

We can reuse make_cond_for_index()/make_remainder_cond() code to extract part
of condition that can be pushed, and the remainder, respectively.

Pushability criteria for an atomic (i.e. not AND/OR) condition is that
   
  the condition only uses VIEW's fields.

(TODO: what about fields of const tables? Do we have const tables already
retrived by the time VIEW is materialized? If yes, we could push down 
expressions that refer to const tables, too)

3. Pushdown from HAVING into WHERE
==================================
The idea is:

  Parts of HAVING that refer to columns/expressions we're doing GROUP BY on
  can be put into WHERE.

(TODO: do we need to handle case of grouping over expressions?)

(TODO: when moving expression for HAVING to WHERE, do we need 
to do something with it? Replace all Item_ref objects with items that
they refer to? 
  - In case of referring to expression, do we get 
    Item_ref(where_clause_expr) or expr( Item_ref(..), .., Item_ref(..))? 
)

4. When to do the pushdown
==========================
In order to do pushdown, we must have prepare phase finished 
for both parent (so that we can make sense of its WHERE condition) and 
child (so that we know what it has in its select list).

We can do pushdown before we've done join optimization (i.e. choose_plan()
call) of the parent.

We must do pushdown before we've done JOIN::optimize() of the child
(in particular, it must be done before we do update_ref_and_keys() and 
range analysis in the child).


5. Other things to take care of
===============================
* Pushing down fulltext predicates (it seems one needs to "register" a 
  fulltext predicate when it is moved from one select from another? Ask Serg)


ESTIMATED WORK TIME

ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)






Follow ups