← Back to team overview

maria-developers team mailing list archive

WL#253 New (by Igor): Take into account the selectivity of single-table range predicates on non-indexed columns

 

-----------------------------------------------------------------------
                              WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Take into account the selectivity of single-table range predicates on
		non-indexed columns
CREATION DATE..: Wed, 06 Mar 2013, 17:25
SUPERVISOR.....: 
IMPLEMENTOR....: Igor
COPIES TO......: Psergey
CATEGORY.......: Server-BackLog
TASK ID........: 253 (http://askmonty.org/worklog/?tid=253)
VERSION........: Server-9.x
STATUS.........: Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0

PROGRESS NOTES:



DESCRIPTION:

Description of the problem
==========================

Currently the optimizer completely ignores any conditions on non-indexed columns
when searching for the best execution plan. As a result in many cases it chooses
a suboptimal plan.

Let's consider the query:
SELECT * FROM t1,t2 WHERE t1.a=t2.a and t2.b BETWEEN (1,3)
assuming that:
-table t1 contains 100 records 
-table t2 contains 1000 records
-there is a primary index on t1(a)
-there is a secondary index on t2(a)
-there is no index defined on column t2.b
-the selectivity of the condition t2.b BETWEEN (1,3) is high (~ 1%)

Under these conditions currently the optimizer will choose the plan that
accesses t1 using a table scan
accesses t2 using index t2(a)
checks the condition t2.b BETWEEN (1,3)
This plan examines all rows of both tables and it performs 100 index look-ups.

The alternative plan that:
accesses table t2 in a table scan
checks the condition t2.b BETWEEN (1,3)
accesses t1 using index t1(a)
also would examine all rows of t2, but it performs only ~10 look-ups
to access ~10 rows of table t1.
The second plan is expected to be more efficient and it is so.

The plan to resolve the problem
===============================

1. Build single-table range conditions over different columns of all tables of
the join query.
2. Calculate the selectivity of each such condition using persistent statistics
on table columns.
3. Take this statistics into account when calculated the cardinality of each
partial join that are evaluated by the optimizer.


ESTIMATED WORK TIME

ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v4.0.0)