← Back to team overview

maria-developers team mailing list archive

Cost analysis: FirstMatch plan+scalar subquery cost, prediction

 

On Wed, Aug 08, 2012 at 07:04:39PM +0400, Sergei Petrunia wrote:
> Please find below cost analysis for the plan with FirstMatch strategy.
> 
> Debugging with optimizer_switch='semijoin=off', query:
> 
> explain extended
> select sql_calc_found_rows
>        s_name, s_address
> from nation straight_join supplier
> where s_suppkey in (select ps_suppkey from partsupp
>                     where ps_partkey in (select p_partkey from part
>                                          where p_name like 'forest%')
>                           and ps_availqty >
>                               (select 0.5 * sum(l_quantity)
>                                from lineitem
>                                where l_partkey = ps_partkey
>                                      and l_suppkey = ps_suppkey
>                                      and l_shipdate >= date('1994-01-01')
>                                      and l_shipdate < date('1994-01-01') +
>                                          interval '1' year ))
> and s_nationkey = n_nationkey
> and n_name = 'CANADA'
> order by s_name
> limit 10;
> 
> +----+------------------+--------+------++-------------------+-------+---------------------------------------+----+--------------------------------------------+
> |id  |select_type       |table   |type  ||key                |key_len|ef                                     |rows|Extra                                       |
> +----+------------------+--------+------++-------------------+-------+---------------------------------------+----+--------------------------------------------+
> |   1|PRIMARY           |nation  |ALL   ||NULL               |NULL   |NULL                                   |  25|Using where; Using temporary; Using filesort|
> |   1|PRIMARY           |supplier|ref   ||i_s_nationkey      |5      |nation.n_nationkey                     | 180|                                            |
> |   1|PRIMARY           |partsupp|ref   ||i_ps_suppkey       |4      |supplier.s_suppkey                     |  39|Using where                                 |
> |   1|PRIMARY           |part    |eq_ref||PRIMARY            |4      |partsupp.ps_partkey                    |   1|Using where; FirstMatch(supplier)           |
> |   4|DEPENDENT SUBQUERY|lineitem|ref   ||i_l_suppkey_partkey|10     |partsupp.ps_partkey,partsupp.ps_suppkey|   4|Using where                                 |
> +----+------------------+--------+------++-------------------+-------+---------------------------------------+----+--------------------------------------------+


Cost of scalar subquery execution (in choose_plan()):
(gdb) p join->best_read
  $144 = 4.5989999999999993

 From the email "check if optimizer is able to produce a correct
 E(#scalar_subquery executions)", we know:

>>Let's take estimate numbers:
>>1 * 251 * 34 * 1 * 0.0118 = 100.7012 evaluations.

which gives: $OVERALL_SCALAR_SUBQUERY_COST= 100.7 * 4.6= 463.2

> ===== 1. nation === 
> (gdb) p position->records_read 
>   $367 = 25
> (gdb) p position->read_time
>   $368 = 1
> (gdb) p current_record_count / TIME_FOR_COMPARE 
>   $369 = 5
> (gdb) p current_read_time
>   $370 = 6
> 
> ===== 2. supplier === 
> 
> (gdb) p position->records_read
>   $377 = 180
> (gdb) p position->read_time
>   $378 = 4525
> (gdb) p current_record_count / TIME_FOR_COMPARE
>   $379 = 900
> (gdb) p current_read_time
>   $380 = 5431
> 
> ## here it considers "part, partsupp" extension, with cost 
> ## 12,984,927. I skip that, because that's not the optimal plan
> 
> ===== 3. partsupp === 
> 
> (gdb) p position->records_read
>   $411 = 39
> (gdb) p position->read_time
>   $412 = 180000
> (gdb) p current_record_count / TIME_FOR_COMPARE
>   $413 = 35100
> (gdb) p current_read_time
>   $414 = 220531
>  
> ===== 4. part === 
> (gdb) p position->records_read
>   $421 = 1
> (gdb) p position->read_time
>   $422 = 175500
> (gdb) p current_record_count / TIME_FOR_COMPARE
>   $423 = 35100
> (gdb) p current_read_time
>   $424 = 431131
> 
> > advance_sj_state()
>   > Firstmatch_picker::check_qep
>     > optimize_wo_join_buffering
> 
>      ### It doesn't make any best_access_path() calls here, because no table is
>      ###  using join buffering.
> 
>      ### It forgets to add (current_record_count / TIME_FOR_COMPARE), and ends
>      ###   up with a slightly lower cost of
>      (gdb) print cost
>      $425 = 360931
>     > optimize_wo_join_buffering
>   > Firstmatch_picker::check_qep
> 
>   ## FirstMatch is chosen as the way we will use to eliminate duplicates..
>   
>   ## At the end of advance_sj_state, we have:
> 
>   (gdb) p *current_read_time
>     $426 = 360931
>   (gdb) p *current_record_count
>     $427 = 4500
> > advance_sj_state().
Adding $OVERALL_SCALAR_SUBQUERY_COST: 

 360931 + $OVERALL_SCALAR_SUBQUERY_COST = 361394.20.

which is still much greater than the cost of Materialization+$SCALAR_SUBQ_COST,
which is 24101.8  (see prev. email).

We've got to think what else is wrong here.



BR
 Sergei
-- 
Sergei Petrunia, Software Developer
Monty Program AB, http://askmonty.org
Blog: http://s.petrunia.net/blog


References