← Back to team overview

maria-developers team mailing list archive

Cost analysis: Materialization plan+scalar subquery cost, prediction

 

On Wed, Aug 08, 2012 at 07:06:39PM +0400, Sergei Petrunia wrote:
> Debugging with optimizer_switch='materialization=on', query:
> 
> explain extended
> select sql_calc_found_rows
>        s_name, s_address
> from nation straight_join supplier
> where s_suppkey in (select ps_suppkey from partsupp
>                     where ps_partkey in (select p_partkey from part
>                                          where p_name like 'forest%')
>                           and ps_availqty >
>                               (select 0.5 * sum(l_quantity)
>                                from lineitem
>                                where l_partkey = ps_partkey
>                                      and l_suppkey = ps_suppkey
>                                      and l_shipdate >= date('1994-01-01')
>                                      and l_shipdate < date('1994-01-01') +
>                                          interval '1' year ))
> and s_nationkey = n_nationkey
> and n_name = 'canada'
> order by s_name
> limit 10;
> +----+------------------+-----------+------++-------------------+-------+---------------------------------------+----+--------+--------------------------------------------+
> |id  |select_type       |table      |type  ||key                |key_len|ref                                    |rows|filtered|Extra                                       |
> +----+------------------+-----------+------++-------------------+-------+---------------------------------------+----+--------+--------------------------------------------+
> |   1|PRIMARY           |nation     |ALL   ||NULL               |NULL   |NULL                                   |  25|  100.00|Using where; Using temporary; Using filesort|
> |   1|PRIMARY           |supplier   |ref   ||i_s_nationkey      |5      |nation.n_nationkey                     | 180|  100.00|                                            |
> |   1|PRIMARY           |<subquery2>|eq_ref||distinct_key       |4      |func                                   |   1|  100.00|                                            |
> |   2|MATERIALIZED      |part       |range ||p_name             |58     |NULL                                   |2387|  100.00|Using where; Using index                    |
> |   2|MATERIALIZED      |partsupp   |ref   ||PRIMARY            |4      |part.p_partkey                         |   1|  100.00|Using where                                 |
> |   4|DEPENDENT SUBQUERY|lineitem   |ref   ||i_l_suppkey_partkey|10     |partsupp.ps_partkey,partsupp.ps_suppkey|   4|  100.00|Using where                                 |
> +----+------------------+-----------+------++-------------------+-------+---------------------------------------+----+--------+--------------------------------------------+

- The join order in SJ-Materialization nest is (part, partsupp).
- scalar subquery is correlated with partsupp, so it can only be attached to
  partsupp
- join order of (partsupp, part) is much more expensive so we can be certain it
  won't be used when we start taking scalar subquery costs into account.

Cost of scalar subquery execution (in choose_plan()):
(gdb) p join->best_read
  $144 = 4.5989999999999993

We estimate that it is run 2378*1=2378 times
(NOTE: in reality, it is 9555 times, see "optimizer statistics vs real dataset"
email, which also explains the reason of discrepansy: ref access on table
partsupp has estimate rows=1, while it is actually 4)

Added subquery cost is: $SCALAR_SUBQ_COST= 4.6 * 2378 =10398.8

Now, let's walk down and add it, where appropriate:

>
> ********* SEMI-JOIN pre-optimization **************
> 
> ===== PRE-1. part === 
> (gdb) p position->records_read
>   $564 = 2387
> (gdb) p position->read_time
>   $565 = 497.33246183668518
> (gdb) p current_record_count / TIME_FOR_COMPARE
>   $566 = 477.39999999999998
> (gdb) p current_read_time
>   $567 = 974.73246183668516
> 
> ===== PRE-2. partsupp === 
> 
> (gdb) p position->records_read
>   $570 = 1
> (gdb) p position->read_time
>   $571 = 2390.4263969377266
> (gdb) p current_record_count / TIME_FOR_COMPARE
>   $572 = 477.39999999999998
> (gdb) p current_read_time
>   $573 = 3842.5588587744119
> 
> ## (part, partsupp) is the best plan to materialize the subquery.
>   (gdb) p join->best_read
>     $575 = 6229.5578587744112
>   (gdb) p join->record_count
>     $576 = 2387
> 
> ## Extra expenses for semi-join
> (gdb) print lookup_cost
>   $582 = 0.050000000000000003
> (gdb) print write_cost
>   $584 = 0.050000000000000003

#### Here $SCALAR_SUBQ_COST will be added.

> 
> ********* MAIN join optimization **************
> ===== 1. nation === 
> (gdb) p position->records_read
>   $587 = 25
> (gdb) p position->read_time
>   $588 = 1
> (gdb) p current_record_count / TIME_FOR_COMPARE
>   $589 = 5
> (gdb) p current_read_time
>   $590 = 6
> 
> ===== 2. supplier === 
> (gdb) p position->records_read
>   $593 = 180
> (gdb) p position->read_time
>   $594 = 4525
> (gdb) p current_record_count / TIME_FOR_COMPARE
>   $595 = 900
> (gdb) p current_read_time
>   $596 = 5431
> 
> ## here it considers "part, partsupp" extension, with huge cost: 
> ## I don't skip it:
> 
> === 3.A part ===
> (gdb) p position->records_read
>   $599 = 2387
> (gdb) p position->read_time
>   $600 = 2237996.0782650835
> (gdb) p current_record_count / TIME_FOR_COMPARE
>   $601 = 2148300
> (gdb) p current_read_time
>   $602 = 4391727.078265084
> 
> === 4.A partsupp ==
> (gdb) p position->records_read
>   $605 = 1
> (gdb) p position->read_time
>   $606 = 10741500
> (gdb) p current_record_count / TIME_FOR_COMPARE
>   $607 = 2148300
> (gdb) p current_read_time
>   $608 = 17281527.078265086
> 
> > advance_sj_state
>   > Firstmatch_picker::check_qep
>     ### Picks first match with 
>     (gdb) print read_time
>       $615 = 12984927.078265084
>     (gdb) print rec_count
>       $616 = 4500
>   < Firstmatch_picker::check_qep
> 
>   > Sj_materialization_picker::check_qep
>     (gdb) p prefix_cost.total_cost()
>       $622 = 5431   # OK, same as $596
> 
>     (gdb) p mat_info->materialization_cost.total_cost()
>       $623 = 3007.1088587744121  
>     ## This one is weird.  It should be $575 + $582 + $584= 6229.6578587744098
>     ## I will investigate.
>     ##  The difference is not millions, but a ~3K. 
> 
>     (gdb) print prefix_rec_count
>       $625 = 4500
>     (gdb) p mat_info->lookup_cost.total_cost()
>       $626 = 0.050000000000000003
>   < Sj_materialization_picker::check_qep
> 
> ## Materialization is chosen as the best strategy with
> 
>   (gdb) print current_record_count
>     $634 = 4500
>   (gdb) print current_read_time
>     $635 = 8663.1088587744125

Here $SCALAR_SUBQ_COST will be added.

> < advance_sj_state
>  
> === 5. Finally ===
> We hit these lines:
>         if (join->sort_by_table &&
>             join->sort_by_table !=
>             join->positions[join->const_tables].table->table)
>           /* We have to make a temp table */
>           current_read_time+= current_record_count;
> and get the final query plan cost of
> 
> (gdb) p join->record_count
>   $643 = 4500
> (gdb) p join->best_read
>   $644 = 13163.107858774412
> 
Here we will see the added $SCALAR_SUBQ_COST, which will give the final cost of:

13163.1 + 10938.7 = 24101.8

BR
 Sergei
-- 
Sergei Petrunia, Software Developer
Monty Program AB, http://askmonty.org
Blog: http://s.petrunia.net/blog


Follow ups

References