maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #04845
Predictions re whether cost-based predicate pushdown at optimizer phase will cause QEP to change
On Fri, Aug 10, 2012 at 01:00:52AM +0400, Sergei Petrunia wrote:
> Subject: Re: Cost analysis: checking if join optimization costs match the
> reality.
> Hi!
>
> I've ran this experiment: I took the scalar subquery out of the query and then
> tried to compare optimizer costs with execution times.
>
> == The query ==
> explain extended
> select sql_calc_found_rows
> s_name, s_address
> from nation straight_join supplier
> where s_suppkey in (select ps_suppkey from partsupp
> where ps_partkey in (select p_partkey from part
> where p_name like 'forest%')
> )
> and s_nationkey = n_nationkey
> and n_name = 'CANADA'
> order by s_name
> limit 10;
> +--+-----------+--------+------++-------------+-------+-------------------+----+---------------------------------------------------------+
> |id|select_type|table |type ||key |key_len|ref |rows|Extra |
> +--+-----------+--------+------++-------------+-------+-------------------+----+---------------------------------------------------------+
> | 1|PRIMARY |nation |ref ||n_name |26 |const | 1|Using where; Using index; Using temporary; Using filesort|
> | 1|PRIMARY |supplier|ref ||i_s_nationkey|5 |nation.n_nationkey | 207| |
> | 1|PRIMARY |partsupp|ref ||i_ps_suppkey |4 |supplier.s_suppkey | 50|Using index |
> | 1|PRIMARY |part |eq_ref||PRIMARY |4 |partsupp.ps_partkey| 1|Using where; FirstMatch(supplier) |
> +--+-----------+--------+------++-------------+-------+-------------------+----+---------------------------------------------------------+
>
> +--+------------+-----------+------++-------------+-------+------------------+----+---------------------------------------------------------+
> |id|select_type |table |type ||key |key_len|ref |rows|Extra |
> +--+------------+-----------+------++-------------+-------+------------------+----+---------------------------------------------------------+
> | 1|PRIMARY |nation |ref ||n_name |26 |const | 1|Using where; Using index; Using temporary; Using filesort|
> | 1|PRIMARY |supplier |ref ||i_s_nationkey|5 |nation.n_nationkey| 207| |
> | 1|PRIMARY |<subquery2>|eq_ref||distinct_key |4 |func | 1| |
> | 2|MATERIALIZED|part |range ||p_name |58 |NULL |2387|Using where; Using index |
> | 2|MATERIALIZED|partsupp |ref ||i_ps_partkey |4 |part.p_partkey | 2|Using index |
> +--+------------+-----------+------++-------------+-------+------------------+----+---------------------------------------------------------+
>
>
> === Execution times ===
> FirstMatch: 0.20 - 0.30 sec.
> Materialization: 0.20 sec - 0.30 sec
>
> run times for both fluctuate between 0.20 and 0.30 sec, FirstMatch seems to be
> somewhat slower (dunno if the difference is statistically meaningful)
>
> == Optimizer costs ===
> FirstMatch: 11029.4
> Materialization: 3594.5
Besides that, we know:
Cost of scalar subquery execution (in choose_plan()):
(gdb) p join->best_read
$144 = 4.5989999999999993 = 4.6
=== Scalar subquery costs for FirstMatch ===
Number of evaluations includes the selectivity of partsupp that's not show in
EXPLAIN:
fm_n_evaluations= 1 * 207 * 50 * 1 * 0.0118 = 122
fm_total_subq_cost = 4.6 * 122 = 562
fm_adjusted_total_cost= 11029.4 + 562= 11591.4
=== Scalar subquery costs for Materialization ===
sjm_n_evaluations = 2387*1 = 2387
sjm_total_subq_cost = 2387 * 4.6 = 10980.
sjm_adjusted_total_cost= 3594.5 + 10980= 14574.5
That is,
sjm_adjusted_total_cost > fm_total_subq_cost
which means it should choose FirstMatch!
== Difference is not very satisfying, though ==
The result is not fully satisfying though, because cost numbers are close,
while we have this data on execution:
<spetrunia> Mine:
<spetrunia> Materialization: 5.30 sec (3x)
<spetrunia> First-Match, with 'fixed' scalar predicate: 1.72 sec (1x)
<spetrunia> FirstMatch, original: ~15 sec (8x)
<spetrunia> Timour:
<spetrunia> FirstMatch, mdev-83: 1x
<spetrunia> Materialization: 5x
<spetrunia> FirstMatch, original: 8.3x
that is "fixed FirstMatch" is 3x-5x faster than Materialization, while the costs differ by a factor of
14574.5/11591.4 = 1.25
On the other hand, see email "Cost analysis: optimizer statistics vs real dataset properties":
- supplier.rows =207 ,while in reality it's 412, precise stats gives 400.
- for Materialization plan, partsupp.rows=2, precise stats gives 4.
- for FirstMatch partsupp.rows=50, while precise stats (and data we will hit)
shows it should be 80.
Last but not the least: that email shows that the scalar-context subquery will
scan 3 rows.
select
(select count(*) from lineitem) / (select count(distinct l_partkey, l_suppkey) from lineitem);
gives 7.5, which means subquery is 2x more expensive than the optimizer thinks
it is.
=== Conclusion ===
- Precise stats will cause different cost numbers. I'm not sure what they will
be, but there is some hope FirstMatch may still win.
BR
Sergei
--
Sergei Petrunia, Software Developer
Monty Program AB, http://askmonty.org
Blog: http://s.petrunia.net/blog
References