← Back to team overview

maria-developers team mailing list archive

Updated (by Guest): Table elimination (17)

 

-----------------------------------------------------------------------
                              WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Table elimination
CREATION DATE..: Sun, 10 May 2009, 19:57
SUPERVISOR.....: Monty
IMPLEMENTOR....: Psergey
COPIES TO......: 
CATEGORY.......: Server-Sprint
TASK ID........: 17 (http://askmonty.org/worklog/?tid=17)
VERSION........: Server-5.1
STATUS.........: Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0

PROGRESS NOTES:

-=-=(Guest - Fri, 22 May 2009, 17:00)=-=-
Version updated.
--- /tmp/wklog.17.old.30176     2009-05-22 17:00:35.000000000 +0300
+++ /tmp/wklog.17.new.30176     2009-05-22 17:00:35.000000000 +0300
@@ -1 +1 @@
-Maria-2.0
+Server-5.1

-=-=(Guest - Fri, 22 May 2009, 17:00)=-=-
Category updated.
--- /tmp/wklog.17.old.30162     2009-05-22 17:00:28.000000000 +0300
+++ /tmp/wklog.17.new.30162     2009-05-22 17:00:28.000000000 +0300
@@ -1 +1 @@
-Maria-Sprint
+Server-Sprint

-=-=(Guest - Fri, 22 May 2009, 17:00)=-=-
Version updated.
--- /tmp/wklog.17.old.30162     2009-05-22 17:00:28.000000000 +0300
+++ /tmp/wklog.17.new.30162     2009-05-22 17:00:28.000000000 +0300
@@ -1 +1 @@
-2.0
+9.x

-=-=(Guest - Fri, 22 May 2009, 17:00)=-=-
Version updated.
--- /tmp/wklog.17.old.30162     2009-05-22 17:00:28.000000000 +0300
+++ /tmp/wklog.17.new.30162     2009-05-22 17:00:28.000000000 +0300
@@ -1 +1 @@
-9.x
+Maria-2.0

-=-=(Guest - Fri, 22 May 2009, 17:00)=-=-
Category updated.
--- /tmp/wklog.17.old.30122     2009-05-22 17:00:02.000000000 +0300
+++ /tmp/wklog.17.new.30122     2009-05-22 17:00:02.000000000 +0300
@@ -1 +1 @@
-Server-RawIdeaBin
+Maria-Sprint

-=-=(Guest - Fri, 22 May 2009, 17:00)=-=-
Version updated.
--- /tmp/wklog.17.old.30122     2009-05-22 17:00:02.000000000 +0300
+++ /tmp/wklog.17.new.30122     2009-05-22 17:00:02.000000000 +0300
@@ -1 +1 @@
-Server-5.2
+2.0

-=-=(Guest - Fri, 22 May 2009, 17:00)=-=-
Status updated.
--- /tmp/wklog.17.old.30122     2009-05-22 17:00:02.000000000 +0300
+++ /tmp/wklog.17.new.30122     2009-05-22 17:00:02.000000000 +0300
@@ -1 +1 @@
-Un-Assigned
+Assigned

-=-=(Guest - Thu, 21 May 2009, 22:54)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.28468     2009-05-21 22:54:28.000000000 +0300
+++ /tmp/wklog.17.new.28468     2009-05-21 22:54:28.000000000 +0300
@@ -77,7 +77,7 @@
 - Relationship with prepared statements. 
   On one hand, it's natural to desire to make table elimination a
   once-per-statement operation, like outer->inner join conversion. We'll have
-  to limit the applicablity by removing [MARK1] as that can change during
+  to limit the applicability by removing [MARK1] as that can change during
   lifetime of the statement.
 
   The other option is to do table elimination every time. This will require to

-=-=(Psergey - Tue, 19 May 2009, 20:20)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.27126     2009-05-19 20:20:06.000000000 +0300
+++ /tmp/wklog.17.new.27126     2009-05-19 20:20:06.000000000 +0300
@@ -48,6 +48,10 @@
   another outer join nest, that shouldn't prevent us from moving away the
   eliminated tables. 
 
+* Update join->table_count and all-join-tables bitmap.
+
+* That's it. Nothing else?
+
 4. User interface
 -----------------
 * We'll add an @@optimizer switch flag for table elimination. Tentative

-=-=(Psergey - Tue, 19 May 2009, 20:18)=-=-
Low Level Design modified.
--- /tmp/wklog.17.old.27035     2009-05-19 20:18:27.000000000 +0300
+++ /tmp/wklog.17.new.27035     2009-05-19 20:18:27.000000000 +0300
@@ -1 +1,92 @@
+<contents>
+1. Conditions for removal
+2. Removal operation properties
+3. Removal operation
+4. User interface
+5. Todo, issues to resolve
+5.1 To resolve
+5.2 Resolved
+</contents>
+
+It's not really about elimination of tables, it's about elimination of inner
+sides of outer joins.
+
+1. Conditions for removal
+-------------------------
+We can eliminate an inner side of outer join if:
+1. For each record combination of outer tables, it will always produce
+   exactly one record.
+2. There are no references to columns of the inner tables anywhere else in
+   the query.
+
+#1 means that every table inside the outer join nest is:
+ - is a constant table:
+    = because it can be accessed via eq_ref(const) access, or
+    = it is a zero-rows or one-row MyISAM-like table  [MARK1]
+ - has an eq_ref access method candidate.
+
+#2 means that WHERE clause, ON clauses of embedding outer joins, ORDER BY,
+  GROUP BY and HAVING do not refer to the inner tables of the outer join
+  nest.
+
+2. Removal operation properties
+-------------------------------
+* There is always one way to remove (no choice to remove either this or that)
+* It is always better to remove as much tables as possible (at least within 
+  our cost model).
+Thus, no need for any cost calculations/etc. It's an unconditional rewrite.
+
+3. Removal operation
+--------------------
+* Remove the outer join nest's nested join structure (i.e. get the
+  outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding,
+  $OJ->embedding->nested_join. Update table_map's of all ancestor nested
+  joins).  [MARK2]
+
+* Move the tables and their JOIN_TABs to front like it is done with const
+  tables, with exception that if eliminated outer join nest was within
+  another outer join nest, that shouldn't prevent us from moving away the
+  eliminated tables. 
+
+4. User interface
+-----------------
+* We'll add an @@optimizer switch flag for table elimination. Tentative
+  name: 'table_elimination'.
+
+* With EXPLAIN, there are two options: 
+  - Show removed tables in a way similar to const tables, with some
+    indication that they are removed.
+  - Do not show them altogether.
+(the second one seems to be better? We're targeting a situation with VIEWs,
+where the user would not care about what tables were added into his query
+and then discarded from it?)
+
+5. Todo, issues to resolve
+--------------------------
+
+5.1 To resolve
+~~~~~~~~~~~~~~
+- See EXPLAIN question in section #4.
+
+- Re-check how this works with equality propagation.
+
+- Relationship with prepared statements. 
+  On one hand, it's natural to desire to make table elimination a
+  once-per-statement operation, like outer->inner join conversion. We'll have
+  to limit the applicablity by removing [MARK1] as that can change during
+  lifetime of the statement.
+
+  The other option is to do table elimination every time. This will require to
+  rework operation [MARK2] to be undoable.
+  
+  I'm leaning towards doing the former. With anchor modeling, it is unlikely
+  that we'll meet outer joins which have N inner tables of which some are 1-row
+  MyISAM tables that do not have primary key.
+
+5.2 Resolved
+~~~~~~~~~~~~
+- outer->inner join conversion is not a problem for table elimination. 
+  We make outer->inner conversions based on predicates in WHERE. If the WHERE
+  referred to an inner table (requirement for OJ->IJ conversion) then table
+  elimination would not be applicable anyway.
 

	------------------------------------------------------------

		-=-=(View All Progress Notes, 12 total)=-=-
	http://askmonty.org/worklog/index.pl?tid=17&nolimit=1


DESCRIPTION:

Eliminate not needed tables from SELECT queries..
This will speed up some views and automatically generated queries.

Example:

CREATE TABLE B (id int primary key);

select
 A.colA
from
 tableA A
left outer join
 tableB B
on
 B.id = A.id;

In this case we can remove table B and the join from the query.


HIGH-LEVEL SPECIFICATION:



Here is an extended explanation of table elimination.

Table elimination is a feature found in some modern query optimizers, of
which Microsoft SQL Server 2005/2008 seems to have the most advanced
implementation. Oracle 11g has also been confirmed to use table
elimination but not to the same extent.

Basically, what table elimination does, is to remove tables from the
execution plan when it is unneccessary to include them. This can, of
course, only happen if the right circumstances arise. Let us for example
look at the following query:

select
 A.colA
from
 tableA A
left outer join
 tableB B
on
 B.id = A.id;

When using A as the left table we ensure that the query will return at
least as many rows as there are in that table. For rows where the join
condition (B.id = A.id) is not met the selected column (A.colA) will
contain a NULL value.

However, the result set could actually contain more rows than what is
found in tableA if there are duplicates of the column B.id in tableB. If
A
contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"]
then two rows will match in the join condition. The only way to know
what
the result will look like is to actually touch both tables during
execution.

Instead, let's say that tableB contains rows that make it possible to
place a unique constraint on the column B.id, for example and often the
case a primary key. In this situation we know that we will get exactly
as
many rows as there are in tableA, since joining with tableB cannot
introduce any duplicates. If further, as in the example query, we do not
select any columns from tableB, touching that table during execution is
unneccessary. We can remove the whole join operation from the execution
plan.

Both SQL Server 2005/2008 and Oracle 11g will deploy table elimination
in
the case described above. Let us look at a more advanced query, where
Oracle fails.

select
 A.colA
from
 tableA A
left outer join
 tableB B
on
 B.id = A.id
and
 B.fromDate = (
   select
     max(sub.fromDate)
   from
     tableB sub
   where
     sub.id = A.id
 );

In this example we have added another join condition, which ensures
that we only pick the matching row from tableB having the latest
fromDate. In this case tableB will contain duplicates of the column
B.id, so in order to ensure uniqueness the primary key has to contain
the fromDate column as well. In other words the primary key of tableB
is (B.id, B.fromDate).

Furthermore, since the subselect ensures that we only pick the latest
B.fromDate for a given B.id we know that at most one row will match
the join condition. We will again have the situation where joining
with tableB cannot affect the number of rows in the result set. Since
we do not select any columns from tableB, the whole join operation can
be eliminated from the execution plan.

SQL Server 2005/2008 will deploy table elimination in this situation as
well. We have not found a way to make Oracle 11g use it for this type of
query. Queries like these arise in two situations. Either when you have
denormalized model consisting of a fact table with several related
dimension tables, or when you have a highly normalized model where each
attribute is stored in its own table. The example with the subselect is
common whenever you store historized/versioned data.


LOW-LEVEL DESIGN:



<contents>
1. Conditions for removal
2. Removal operation properties
3. Removal operation
4. User interface
5. Todo, issues to resolve
5.1 To resolve
5.2 Resolved
</contents>

It's not really about elimination of tables, it's about elimination of inner
sides of outer joins.

1. Conditions for removal
-------------------------
We can eliminate an inner side of outer join if:
1. For each record combination of outer tables, it will always produce
   exactly one record.
2. There are no references to columns of the inner tables anywhere else in
   the query.

#1 means that every table inside the outer join nest is:
 - is a constant table:
    = because it can be accessed via eq_ref(const) access, or
    = it is a zero-rows or one-row MyISAM-like table  [MARK1]
 - has an eq_ref access method candidate.

#2 means that WHERE clause, ON clauses of embedding outer joins, ORDER BY,
  GROUP BY and HAVING do not refer to the inner tables of the outer join
  nest.

2. Removal operation properties
-------------------------------
* There is always one way to remove (no choice to remove either this or that)
* It is always better to remove as much tables as possible (at least within 
  our cost model).
Thus, no need for any cost calculations/etc. It's an unconditional rewrite.

3. Removal operation
--------------------
* Remove the outer join nest's nested join structure (i.e. get the
  outer join's TABLE_LIST object $OJ and remove it from $OJ->embedding,
  $OJ->embedding->nested_join. Update table_map's of all ancestor nested
  joins).  [MARK2]

* Move the tables and their JOIN_TABs to front like it is done with const
  tables, with exception that if eliminated outer join nest was within
  another outer join nest, that shouldn't prevent us from moving away the
  eliminated tables. 

* Update join->table_count and all-join-tables bitmap.

* That's it. Nothing else?

4. User interface
-----------------
* We'll add an @@optimizer switch flag for table elimination. Tentative
  name: 'table_elimination'.

* With EXPLAIN, there are two options: 
  - Show removed tables in a way similar to const tables, with some
    indication that they are removed.
  - Do not show them altogether.
(the second one seems to be better? We're targeting a situation with VIEWs,
where the user would not care about what tables were added into his query
and then discarded from it?)

5. Todo, issues to resolve
--------------------------

5.1 To resolve
~~~~~~~~~~~~~~
- See EXPLAIN question in section #4.

- Re-check how this works with equality propagation.

- Relationship with prepared statements. 
  On one hand, it's natural to desire to make table elimination a
  once-per-statement operation, like outer->inner join conversion. We'll have
  to limit the applicability by removing [MARK1] as that can change during
  lifetime of the statement.

  The other option is to do table elimination every time. This will require to
  rework operation [MARK2] to be undoable.
  
  I'm leaning towards doing the former. With anchor modeling, it is unlikely
  that we'll meet outer joins which have N inner tables of which some are 1-row
  MyISAM tables that do not have primary key.

5.2 Resolved
~~~~~~~~~~~~
- outer->inner join conversion is not a problem for table elimination. 
  We make outer->inner conversions based on predicates in WHERE. If the WHERE
  referred to an inner table (requirement for OJ->IJ conversion) then table
  elimination would not be applicable anyway.


ESTIMATED WORK TIME

ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)