← Back to team overview

maria-developers team mailing list archive

Re: Cost analysis: Materialization plan+scalar subquery cost, prediction

 

I've found an error: emails with subjects

- Cost analysis: Materialization plan+scalar subquery cost, prediction (this one)
- Cost analysis: FirstMatch plan+scalar subquery cost, prediction

were made as replies to the out-of-date analysis, where we didn't have index on
nation.n_name, and thus had #rows == 25 for table nation.

We'll need to re-do the prediction.

On Thu, Aug 09, 2012 at 03:27:24PM +0400, Sergei Petrunia wrote:
> On Wed, Aug 08, 2012 at 07:06:39PM +0400, Sergei Petrunia wrote:
> > Debugging with optimizer_switch='materialization=on', query:
> > 
> > explain extended
> > select sql_calc_found_rows
> >        s_name, s_address
> > from nation straight_join supplier
> > where s_suppkey in (select ps_suppkey from partsupp
> >                     where ps_partkey in (select p_partkey from part
> >                                          where p_name like 'forest%')
> >                           and ps_availqty >
> >                               (select 0.5 * sum(l_quantity)
> >                                from lineitem
> >                                where l_partkey = ps_partkey
> >                                      and l_suppkey = ps_suppkey
> >                                      and l_shipdate >= date('1994-01-01')
> >                                      and l_shipdate < date('1994-01-01') +
> >                                          interval '1' year ))
> > and s_nationkey = n_nationkey
> > and n_name = 'canada'
> > order by s_name
> > limit 10;
> > +----+------------------+-----------+------++-------------------+-------+---------------------------------------+----+--------+--------------------------------------------+
> > |id  |select_type       |table      |type  ||key                |key_len|ref                                    |rows|filtered|Extra                                       |
> > +----+------------------+-----------+------++-------------------+-------+---------------------------------------+----+--------+--------------------------------------------+
> > |   1|PRIMARY           |nation     |ALL   ||NULL               |NULL   |NULL                                   |  25|  100.00|Using where; Using temporary; Using filesort|
> > |   1|PRIMARY           |supplier   |ref   ||i_s_nationkey      |5      |nation.n_nationkey                     | 180|  100.00|                                            |
> > |   1|PRIMARY           |<subquery2>|eq_ref||distinct_key       |4      |func                                   |   1|  100.00|                                            |
> > |   2|MATERIALIZED      |part       |range ||p_name             |58     |NULL                                   |2387|  100.00|Using where; Using index                    |
> > |   2|MATERIALIZED      |partsupp   |ref   ||PRIMARY            |4      |part.p_partkey                         |   1|  100.00|Using where                                 |
> > |   4|DEPENDENT SUBQUERY|lineitem   |ref   ||i_l_suppkey_partkey|10     |partsupp.ps_partkey,partsupp.ps_suppkey|   4|  100.00|Using where                                 |
> > +----+------------------+-----------+------++-------------------+-------+---------------------------------------+----+--------+--------------------------------------------+
> 
> - The join order in SJ-Materialization nest is (part, partsupp).
> - scalar subquery is correlated with partsupp, so it can only be attached to
>   partsupp
> - join order of (partsupp, part) is much more expensive so we can be certain it
>   won't be used when we start taking scalar subquery costs into account.
> 
> Cost of scalar subquery execution (in choose_plan()):
> (gdb) p join->best_read
>   $144 = 4.5989999999999993
> 
> We estimate that it is run 2378*1=2378 times
> (NOTE: in reality, it is 9555 times, see "optimizer statistics vs real dataset"
> email, which also explains the reason of discrepansy: ref access on table
> partsupp has estimate rows=1, while it is actually 4)
> 
> Added subquery cost is: $SCALAR_SUBQ_COST= 4.6 * 2378 =10398.8
> 
> Now, let's walk down and add it, where appropriate:
> 
> >
> > ********* SEMI-JOIN pre-optimization **************
> > 
> > ===== PRE-1. part === 
> > (gdb) p position->records_read
> >   $564 = 2387
> > (gdb) p position->read_time
> >   $565 = 497.33246183668518
> > (gdb) p current_record_count / TIME_FOR_COMPARE
> >   $566 = 477.39999999999998
> > (gdb) p current_read_time
> >   $567 = 974.73246183668516
> > 
> > ===== PRE-2. partsupp === 
> > 
> > (gdb) p position->records_read
> >   $570 = 1
> > (gdb) p position->read_time
> >   $571 = 2390.4263969377266
> > (gdb) p current_record_count / TIME_FOR_COMPARE
> >   $572 = 477.39999999999998
> > (gdb) p current_read_time
> >   $573 = 3842.5588587744119
> > 
> > ## (part, partsupp) is the best plan to materialize the subquery.
> >   (gdb) p join->best_read
> >     $575 = 6229.5578587744112
> >   (gdb) p join->record_count
> >     $576 = 2387
> > 
> > ## Extra expenses for semi-join
> > (gdb) print lookup_cost
> >   $582 = 0.050000000000000003
> > (gdb) print write_cost
> >   $584 = 0.050000000000000003
> 
> #### Here $SCALAR_SUBQ_COST will be added.
> 
> > 
> > ********* MAIN join optimization **************
> > ===== 1. nation === 
> > (gdb) p position->records_read
> >   $587 = 25
> > (gdb) p position->read_time
> >   $588 = 1
> > (gdb) p current_record_count / TIME_FOR_COMPARE
> >   $589 = 5
> > (gdb) p current_read_time
> >   $590 = 6
> > 
> > ===== 2. supplier === 
> > (gdb) p position->records_read
> >   $593 = 180
> > (gdb) p position->read_time
> >   $594 = 4525
> > (gdb) p current_record_count / TIME_FOR_COMPARE
> >   $595 = 900
> > (gdb) p current_read_time
> >   $596 = 5431
> > 
> > ## here it considers "part, partsupp" extension, with huge cost: 
> > ## I don't skip it:
> > 
> > === 3.A part ===
> > (gdb) p position->records_read
> >   $599 = 2387
> > (gdb) p position->read_time
> >   $600 = 2237996.0782650835
> > (gdb) p current_record_count / TIME_FOR_COMPARE
> >   $601 = 2148300
> > (gdb) p current_read_time
> >   $602 = 4391727.078265084
> > 
> > === 4.A partsupp ==
> > (gdb) p position->records_read
> >   $605 = 1
> > (gdb) p position->read_time
> >   $606 = 10741500
> > (gdb) p current_record_count / TIME_FOR_COMPARE
> >   $607 = 2148300
> > (gdb) p current_read_time
> >   $608 = 17281527.078265086
> > 
> > > advance_sj_state
> >   > Firstmatch_picker::check_qep
> >     ### Picks first match with 
> >     (gdb) print read_time
> >       $615 = 12984927.078265084
> >     (gdb) print rec_count
> >       $616 = 4500
> >   < Firstmatch_picker::check_qep
> > 
> >   > Sj_materialization_picker::check_qep
> >     (gdb) p prefix_cost.total_cost()
> >       $622 = 5431   # OK, same as $596
> > 
> >     (gdb) p mat_info->materialization_cost.total_cost()
> >       $623 = 3007.1088587744121  
> >     ## This one is weird.  It should be $575 + $582 + $584= 6229.6578587744098
> >     ## I will investigate.
> >     ##  The difference is not millions, but a ~3K. 
> > 
> >     (gdb) print prefix_rec_count
> >       $625 = 4500
> >     (gdb) p mat_info->lookup_cost.total_cost()
> >       $626 = 0.050000000000000003
> >   < Sj_materialization_picker::check_qep
> > 
> > ## Materialization is chosen as the best strategy with
> > 
> >   (gdb) print current_record_count
> >     $634 = 4500
> >   (gdb) print current_read_time
> >     $635 = 8663.1088587744125
> 
> Here $SCALAR_SUBQ_COST will be added.
> 
> > < advance_sj_state
> >  
> > === 5. Finally ===
> > We hit these lines:
> >         if (join->sort_by_table &&
> >             join->sort_by_table !=
> >             join->positions[join->const_tables].table->table)
> >           /* We have to make a temp table */
> >           current_read_time+= current_record_count;
> > and get the final query plan cost of
> > 
> > (gdb) p join->record_count
> >   $643 = 4500
> > (gdb) p join->best_read
> >   $644 = 13163.107858774412
> > 
> Here we will see the added $SCALAR_SUBQ_COST, which will give the final cost of:
> 
> 13163.1 + 10938.7 = 24101.8
> 
> BR
>  Sergei
> -- 
> Sergei Petrunia, Software Developer
> Monty Program AB, http://askmonty.org
> Blog: http://s.petrunia.net/blog

-- 
BR
 Sergei
-- 
Sergei Petrunia, Software Developer
Monty Program AB, http://askmonty.org
Blog: http://s.petrunia.net/blog


References