← Back to team overview

maria-developers team mailing list archive

Cost analysis: Materialization plan

 

Debugging with optimizer_switch='materialization=on', query:

explain extended
select sql_calc_found_rows
       s_name, s_address
from nation straight_join supplier
where s_suppkey in (select ps_suppkey from partsupp
                    where ps_partkey in (select p_partkey from part
                                         where p_name like 'forest%')
                          and ps_availqty >
                              (select 0.5 * sum(l_quantity)
                               from lineitem
                               where l_partkey = ps_partkey
                                     and l_suppkey = ps_suppkey
                                     and l_shipdate >= date('1994-01-01')
                                     and l_shipdate < date('1994-01-01') +
                                         interval '1' year ))
and s_nationkey = n_nationkey
and n_name = 'canada'
order by s_name
limit 10;
+----+------------------+-----------+------++-------------------+-------+---------------------------------------+----+--------+--------------------------------------------+
|id  |select_type       |table      |type  ||key                |key_len|ref                                    |rows|filtered|Extra                                       |
+----+------------------+-----------+------++-------------------+-------+---------------------------------------+----+--------+--------------------------------------------+
|   1|PRIMARY           |nation     |ALL   ||NULL               |NULL   |NULL                                   |  25|  100.00|Using where; Using temporary; Using filesort|
|   1|PRIMARY           |supplier   |ref   ||i_s_nationkey      |5      |nation.n_nationkey                     | 180|  100.00|                                            |
|   1|PRIMARY           |<subquery2>|eq_ref||distinct_key       |4      |func                                   |   1|  100.00|                                            |
|   2|MATERIALIZED      |part       |range ||p_name             |58     |NULL                                   |2387|  100.00|Using where; Using index                    |
|   2|MATERIALIZED      |partsupp   |ref   ||PRIMARY            |4      |part.p_partkey                         |   1|  100.00|Using where                                 |
|   4|DEPENDENT SUBQUERY|lineitem   |ref   ||i_l_suppkey_partkey|10     |partsupp.ps_partkey,partsupp.ps_suppkey|   4|  100.00|Using where                                 |
+----+------------------+-----------+------++-------------------+-------+---------------------------------------+----+--------+--------------------------------------------+

********* SEMI-JOIN pre-optimization **************

===== PRE-1. part === 
(gdb) p position->records_read
  $564 = 2387
(gdb) p position->read_time
  $565 = 497.33246183668518
(gdb) p current_record_count / TIME_FOR_COMPARE
  $566 = 477.39999999999998
(gdb) p current_read_time
  $567 = 974.73246183668516

===== PRE-2. partsupp === 

(gdb) p position->records_read
  $570 = 1
(gdb) p position->read_time
  $571 = 2390.4263969377266
(gdb) p current_record_count / TIME_FOR_COMPARE
  $572 = 477.39999999999998
(gdb) p current_read_time
  $573 = 3842.5588587744119

## (part, partsupp) is the best plan to materialize the subquery.
  (gdb) p join->best_read
    $575 = 6229.5578587744112
  (gdb) p join->record_count
    $576 = 2387

## Extra expenses for semi-join
(gdb) print lookup_cost
  $582 = 0.050000000000000003
(gdb) print write_cost
  $584 = 0.050000000000000003

********* MAIN join optimization **************
===== 1. nation === 
(gdb) p position->records_read
  $587 = 25
(gdb) p position->read_time
  $588 = 1
(gdb) p current_record_count / TIME_FOR_COMPARE
  $589 = 5
(gdb) p current_read_time
  $590 = 6

===== 2. supplier === 
(gdb) p position->records_read
  $593 = 180
(gdb) p position->read_time
  $594 = 4525
(gdb) p current_record_count / TIME_FOR_COMPARE
  $595 = 900
(gdb) p current_read_time
  $596 = 5431

## here it considers "part, partsupp" extension, with huge cost: 
## I don't skip it:

=== 3.A part ===
(gdb) p position->records_read
  $599 = 2387
(gdb) p position->read_time
  $600 = 2237996.0782650835
(gdb) p current_record_count / TIME_FOR_COMPARE
  $601 = 2148300
(gdb) p current_read_time
  $602 = 4391727.078265084

=== 4.A partsupp ==
(gdb) p position->records_read
  $605 = 1
(gdb) p position->read_time
  $606 = 10741500
(gdb) p current_record_count / TIME_FOR_COMPARE
  $607 = 2148300
(gdb) p current_read_time
  $608 = 17281527.078265086

> advance_sj_state
  > Firstmatch_picker::check_qep
    ### Picks first match with 
    (gdb) print read_time
      $615 = 12984927.078265084
    (gdb) print rec_count
      $616 = 4500
  < Firstmatch_picker::check_qep

  > Sj_materialization_picker::check_qep
    (gdb) p prefix_cost.total_cost()
      $622 = 5431   # OK, same as $596

    (gdb) p mat_info->materialization_cost.total_cost()
      $623 = 3007.1088587744121  
    ## This one is weird.  It should be $575 + $582 + $584= 6229.6578587744098
    ## I will investigate.
    ##  The difference is not millions, but a ~3K. 

    (gdb) print prefix_rec_count
      $625 = 4500
    (gdb) p mat_info->lookup_cost.total_cost()
      $626 = 0.050000000000000003
  < Sj_materialization_picker::check_qep

## Materialization is chosen as the best strategy with

  (gdb) print current_record_count
    $634 = 4500
  (gdb) print current_read_time
    $635 = 8663.1088587744125

< advance_sj_state
 
=== 5. Finally ===
We hit these lines:
        if (join->sort_by_table &&
            join->sort_by_table !=
            join->positions[join->const_tables].table->table)
          /* We have to make a temp table */
          current_read_time+= current_record_count;
and get the final query plan cost of

(gdb) p join->record_count
  $643 = 4500
(gdb) p join->best_read
  $644 = 13163.107858774412

#records is the same for plan in this email and in the previous. 

The cost is very different, however, it is 13K vs 360K. Now, when we have the
breakdown of the numbers, we can evaluate what did the optimizer get wrong.

BR
 Sergei
-- 
Sergei Petrunia, Software Developer
Monty Program AB, http://askmonty.org
Blog: http://s.petrunia.net/blog


Follow ups