maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #04844
Re: Cost analysis: Materialization plan+scalar subquery cost, prediction
I've found an error: emails with subjects
- Cost analysis: Materialization plan+scalar subquery cost, prediction (this one)
- Cost analysis: FirstMatch plan+scalar subquery cost, prediction
were made as replies to the out-of-date analysis, where we didn't have index on
nation.n_name, and thus had #rows == 25 for table nation.
We'll need to re-do the prediction.
On Thu, Aug 09, 2012 at 03:27:24PM +0400, Sergei Petrunia wrote:
> On Wed, Aug 08, 2012 at 07:06:39PM +0400, Sergei Petrunia wrote:
> > Debugging with optimizer_switch='materialization=on', query:
> >
> > explain extended
> > select sql_calc_found_rows
> > s_name, s_address
> > from nation straight_join supplier
> > where s_suppkey in (select ps_suppkey from partsupp
> > where ps_partkey in (select p_partkey from part
> > where p_name like 'forest%')
> > and ps_availqty >
> > (select 0.5 * sum(l_quantity)
> > from lineitem
> > where l_partkey = ps_partkey
> > and l_suppkey = ps_suppkey
> > and l_shipdate >= date('1994-01-01')
> > and l_shipdate < date('1994-01-01') +
> > interval '1' year ))
> > and s_nationkey = n_nationkey
> > and n_name = 'canada'
> > order by s_name
> > limit 10;
> > +----+------------------+-----------+------++-------------------+-------+---------------------------------------+----+--------+--------------------------------------------+
> > |id |select_type |table |type ||key |key_len|ref |rows|filtered|Extra |
> > +----+------------------+-----------+------++-------------------+-------+---------------------------------------+----+--------+--------------------------------------------+
> > | 1|PRIMARY |nation |ALL ||NULL |NULL |NULL | 25| 100.00|Using where; Using temporary; Using filesort|
> > | 1|PRIMARY |supplier |ref ||i_s_nationkey |5 |nation.n_nationkey | 180| 100.00| |
> > | 1|PRIMARY |<subquery2>|eq_ref||distinct_key |4 |func | 1| 100.00| |
> > | 2|MATERIALIZED |part |range ||p_name |58 |NULL |2387| 100.00|Using where; Using index |
> > | 2|MATERIALIZED |partsupp |ref ||PRIMARY |4 |part.p_partkey | 1| 100.00|Using where |
> > | 4|DEPENDENT SUBQUERY|lineitem |ref ||i_l_suppkey_partkey|10 |partsupp.ps_partkey,partsupp.ps_suppkey| 4| 100.00|Using where |
> > +----+------------------+-----------+------++-------------------+-------+---------------------------------------+----+--------+--------------------------------------------+
>
> - The join order in SJ-Materialization nest is (part, partsupp).
> - scalar subquery is correlated with partsupp, so it can only be attached to
> partsupp
> - join order of (partsupp, part) is much more expensive so we can be certain it
> won't be used when we start taking scalar subquery costs into account.
>
> Cost of scalar subquery execution (in choose_plan()):
> (gdb) p join->best_read
> $144 = 4.5989999999999993
>
> We estimate that it is run 2378*1=2378 times
> (NOTE: in reality, it is 9555 times, see "optimizer statistics vs real dataset"
> email, which also explains the reason of discrepansy: ref access on table
> partsupp has estimate rows=1, while it is actually 4)
>
> Added subquery cost is: $SCALAR_SUBQ_COST= 4.6 * 2378 =10398.8
>
> Now, let's walk down and add it, where appropriate:
>
> >
> > ********* SEMI-JOIN pre-optimization **************
> >
> > ===== PRE-1. part ===
> > (gdb) p position->records_read
> > $564 = 2387
> > (gdb) p position->read_time
> > $565 = 497.33246183668518
> > (gdb) p current_record_count / TIME_FOR_COMPARE
> > $566 = 477.39999999999998
> > (gdb) p current_read_time
> > $567 = 974.73246183668516
> >
> > ===== PRE-2. partsupp ===
> >
> > (gdb) p position->records_read
> > $570 = 1
> > (gdb) p position->read_time
> > $571 = 2390.4263969377266
> > (gdb) p current_record_count / TIME_FOR_COMPARE
> > $572 = 477.39999999999998
> > (gdb) p current_read_time
> > $573 = 3842.5588587744119
> >
> > ## (part, partsupp) is the best plan to materialize the subquery.
> > (gdb) p join->best_read
> > $575 = 6229.5578587744112
> > (gdb) p join->record_count
> > $576 = 2387
> >
> > ## Extra expenses for semi-join
> > (gdb) print lookup_cost
> > $582 = 0.050000000000000003
> > (gdb) print write_cost
> > $584 = 0.050000000000000003
>
> #### Here $SCALAR_SUBQ_COST will be added.
>
> >
> > ********* MAIN join optimization **************
> > ===== 1. nation ===
> > (gdb) p position->records_read
> > $587 = 25
> > (gdb) p position->read_time
> > $588 = 1
> > (gdb) p current_record_count / TIME_FOR_COMPARE
> > $589 = 5
> > (gdb) p current_read_time
> > $590 = 6
> >
> > ===== 2. supplier ===
> > (gdb) p position->records_read
> > $593 = 180
> > (gdb) p position->read_time
> > $594 = 4525
> > (gdb) p current_record_count / TIME_FOR_COMPARE
> > $595 = 900
> > (gdb) p current_read_time
> > $596 = 5431
> >
> > ## here it considers "part, partsupp" extension, with huge cost:
> > ## I don't skip it:
> >
> > === 3.A part ===
> > (gdb) p position->records_read
> > $599 = 2387
> > (gdb) p position->read_time
> > $600 = 2237996.0782650835
> > (gdb) p current_record_count / TIME_FOR_COMPARE
> > $601 = 2148300
> > (gdb) p current_read_time
> > $602 = 4391727.078265084
> >
> > === 4.A partsupp ==
> > (gdb) p position->records_read
> > $605 = 1
> > (gdb) p position->read_time
> > $606 = 10741500
> > (gdb) p current_record_count / TIME_FOR_COMPARE
> > $607 = 2148300
> > (gdb) p current_read_time
> > $608 = 17281527.078265086
> >
> > > advance_sj_state
> > > Firstmatch_picker::check_qep
> > ### Picks first match with
> > (gdb) print read_time
> > $615 = 12984927.078265084
> > (gdb) print rec_count
> > $616 = 4500
> > < Firstmatch_picker::check_qep
> >
> > > Sj_materialization_picker::check_qep
> > (gdb) p prefix_cost.total_cost()
> > $622 = 5431 # OK, same as $596
> >
> > (gdb) p mat_info->materialization_cost.total_cost()
> > $623 = 3007.1088587744121
> > ## This one is weird. It should be $575 + $582 + $584= 6229.6578587744098
> > ## I will investigate.
> > ## The difference is not millions, but a ~3K.
> >
> > (gdb) print prefix_rec_count
> > $625 = 4500
> > (gdb) p mat_info->lookup_cost.total_cost()
> > $626 = 0.050000000000000003
> > < Sj_materialization_picker::check_qep
> >
> > ## Materialization is chosen as the best strategy with
> >
> > (gdb) print current_record_count
> > $634 = 4500
> > (gdb) print current_read_time
> > $635 = 8663.1088587744125
>
> Here $SCALAR_SUBQ_COST will be added.
>
> > < advance_sj_state
> >
> > === 5. Finally ===
> > We hit these lines:
> > if (join->sort_by_table &&
> > join->sort_by_table !=
> > join->positions[join->const_tables].table->table)
> > /* We have to make a temp table */
> > current_read_time+= current_record_count;
> > and get the final query plan cost of
> >
> > (gdb) p join->record_count
> > $643 = 4500
> > (gdb) p join->best_read
> > $644 = 13163.107858774412
> >
> Here we will see the added $SCALAR_SUBQ_COST, which will give the final cost of:
>
> 13163.1 + 10938.7 = 24101.8
>
> BR
> Sergei
> --
> Sergei Petrunia, Software Developer
> Monty Program AB, http://askmonty.org
> Blog: http://s.petrunia.net/blog
--
BR
Sergei
--
Sergei Petrunia, Software Developer
Monty Program AB, http://askmonty.org
Blog: http://s.petrunia.net/blog
References