maria-developers team mailing list archive

Thread
Date
MDEV-83: pushdown of expensive predicates

To: timour@xxxxxxxxxxxx
From: Sergei Petrunia <psergey@xxxxxxxxxxxx>
Date: Mon, 21 May 2012 13:33:24 +0400
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx
User-agent: Mutt/1.5.18 (2008-05-17)
Hi Timour,

Some thoughts about MDEV-83. The jira entry says:

> The main difference in these two plans is that with the first execution plan
> the dependent subquery: ... is evaluated before the table part is accessed 
> while many rows where (p_name like 'forest%') is not true are filtered out
> and evaluation of the dependent subquery is not needed for them.

The way I read this, the only way one could figure out that the subquery
should be evaluated at later phase (after table part) is to figure out that 
selectivity of table `part`, together with its condition of 

   p_name like 'forest%'

is less than one. That way, subquery is evaluated fewer times, and query
execution will be cheaper.

I don't see any index that covers p_name column. This means that current 
optimizer has no clue about selectivity of conditions over p_name column, 
and hence is unable to make the correct decision.

If I add an index on p_name, then I get a different query plan (I didn't check
if is slower or faster than the original).

I think, one of the first things this MDEV needs is to figure out how exactly
we will get selectivities for non-indexed columns.

If we get them, and start to take them into account in join optimizer(*), the
example provided in this MDEV/ bug may just go away :-).  The general problem
will remain, though.

(*) { 
  Currently, the optimizer has "quick_condition_rows", which may cause join fanout
  to be less than 1 (check out examples with EXPLAIN EXTENDED and filtered != 100% 
  in the testsuite). The problem is that quick_condition_rows is not used
  consistently for all ways to access the table,  for example it is not taken
  into account when considering ref access. 
  I do not know whether Igor's work in best_access_path() includes making it
  consistently make use of quick_condition_rows estimates.
}


== Subquery predicate attachment considerations ==

Some thoughts about the problem. There are three factors we may want/need to
take into account when attaching subquery predicate to some join_tab:

1. Fanout of join tabs
2. Cost of the subquery predicate evaluation
3. Selectivity of the subquery predicate.

The example used in this MDEV item can be resolved by taking into account #1
and #2. For the first step, we only need to know

- accurate join tab fanout (#1),
- that subquery predicate evaluation has some cost ( #2 != 0)

these two allow us to set/resolve a task of attaching the subquery predicate
to the join_tab where total cost of its evaluations is minimized.

The exact value of subquery evaluation cost is irrelevant.  It will only
become relevant when/if we start to make a choice between evaluating subquery
predicate, or preforming join with the next join_tab.  In order for that
choice to be meaningful, we will need to have an idea about subquery
predicate's selectivity. That way, we will be able to make a choice between 

1. evaluate subquery, then make join with next join tab

  cost(subquery_pred)  +  selectivity(subquery_pred) * cost(join_tab)

and
2. make join with next join_tab, then evaluate the subquery:

  cost(join_tab)  +  selectivity(join_tab) * cost(join_tab)

BR
 Sergei
-- 
Sergei Petrunia, Software Developer
Monty Program AB, http://askmonty.org
Blog: http://s.petrunia.net/blog