maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #04834
Cost analysis: optimizer statistics vs real dataset properties
So, we have done:
alter table name add index(n_name);
and now we have:
explain extended
select sql_calc_found_rows
s_name, s_address
from nation straight_join supplier
where s_suppkey in (select ps_suppkey from partsupp
where ps_partkey in (select p_partkey from part
where p_name like 'forest%')
and ps_availqty >
(select 0.5 * sum(l_quantity)
from lineitem
where l_partkey = ps_partkey
and l_suppkey = ps_suppkey
and l_shipdate >= date('1994-01-01')
and l_shipdate < date('1994-01-01') +
interval '1' year ))
and s_nationkey = n_nationkey
and n_name = 'canada'
order by s_name
limit 10;
+--+------------------+-----------+------++-------------------+-------+---------------------------------------+----+---------------------------------------------------------+
|id|select_type |table |type ||key |key_len|ref |rows|Extra |
+--+------------------+-----------+------++-------------------+-------+---------------------------------------+----+---------------------------------------------------------+
| 1|PRIMARY |nation |ref ||n_name |26 |const | 1|Using where; Using index; Using temporary; Using filesort|
| 1|PRIMARY |supplier |ref ||i_s_nationkey |5 |nation.n_nationkey | 251| |
| 1|PRIMARY |<subquery2>|eq_ref||distinct_key |4 |func | 1| |
| 2|MATERIALIZED |part |range ||p_name |58 |NULL |2387|Using where; Using index |
| 2|MATERIALIZED |partsupp |ref ||PRIMARY |4 |part.p_partkey | 1|Using where |
| 4|DEPENDENT SUBQUERY|lineitem |ref ||i_l_suppkey_partkey|10 |partsupp.ps_partkey,partsupp.ps_suppkey| 3|Using where |
+--+------------------+-----------+------++-------------------+-------+---------------------------------------+----+---------------------------------------------------------+
=== nation: ===
1 row scanned, for real OK.
- "using where" doesn't filter anything out.
=== supplier ===
- dataset has 412 rows for CANADA
- rec_per_key: 10L suppliers / 25 countries=400
=== <subquery2> ==
- eq_ref, we know that there's always match
- "using where" won't filter anything out.
=== part ===
- dataset/where match 2378 rows for (p_name like 'forest%') OK.
- "using where" won't filter anything out.
=== partsupp ===
- rec_per_key: partsupp has 800K rows and 200K distinct ps_partkey values, which
gives rec_per_key=4 (while EXPLAIN shows 1!)
- select count(*) from partsupp
where ps_partkey in (select p_partkey from part where p_name like 'forest%');
gives 9552 rows.
(rec_per_key=4) * 2378 = 9512 , close to 9552.
- "using where" won't filter anything out (NOT TAKING scalar-subquery into account)
+--+-------------+--------+------++-------------------+-------+---------------------------------------+----+--------+---------------------------------------------------------+
|id|select_type |table |type ||key |key_len|ref |rows|filtered|Extra |
+--+-------------+--------+------++-------------------+-------+---------------------------------------+----+--------+---------------------------------------------------------+
| 1|PRIMARY |nation |ref ||n_name |26 |const | 1| 100.00|Using where; Using index; Using temporary; Using filesort|
| 1|PRIMARY |supplier|ref ||i_s_nationkey |5 |nation.n_nationkey | 251| 100.00| |
| 1|PRIMARY |partsupp|ref ||i_ps_suppkey |4 |supplier.s_suppkey | 34| 100.00|Using where |
| 1|PRIMARY |part |eq_ref||PRIMARY |4 |partsupp.ps_partkey | 1| 100.00|Using where; FirstMatch(supplier) |
| 4|DEP. SUBQUERY|lineitem|ref ||i_l_suppkey_partkey|10 |partsupp.ps_partkey,partsupp.ps_suppkey| 3| 100.00|Using where |
+--+-------------+--------+------++-------------------+-------+---------------------------------------+----+--------+---------------------------------------------------------+
5 rows in set, 3 warnings (0.01 sec)
=== nation: ===
like above, 1 row. OK.
- "using where" won't filter anything out.
=== supplier: ===
see above, 251 rows, ~ok 412 real...
=== partsupp: ===
- 800K rows and 10K distinct ps_suppkey, which gives rec_per_key=80 (EXPLAIN shows 34)
- as for data that we will hit:
select count(*) from nation, supplier, partsupp
where s_suppkey=ps_suppkey and s_nationkey=n_nationkey and n_name='canada';
gives 32690 rows. 32690 / 412 = 80, matches rec_per_key.
- "using where" won't filter anything out (NOT TAKING scalar-subquery into account)
=== part: ===
- eq_ref, so exactly 1 match. We know DBT-3 dataset is such that it always has one.
- "using where" is "p_name like forest"! it will filter stuff out!
^^^^^^^^ but does it matter.. subquery is attached to ps?
********* Some benchmarking *************
Execution times:
- FirstMatch is ~15 sec
- Materialization: ~5 sec.
With scalar-subquery predicate replaced with TRUE:
- Materialization: ~0.20 sec.
- FirstMatch: 0.19 sec.
Number of times the scalar subquery was executed:
- DEP. SUBQUERY: ... 24055 times
- MATERIALIZATION: 9555 times
If we take the cost of scalar-subquery predicate evaluation into account,
the difference should be around:
(24055 - 9555) * 5 = 72K
which is sufficient.
The problem is, Materialization executes the scalar-subquery fewer times than
FirstMatch. And you expect that taking into account scalar-subquery execution
cost will cause Materialization not to be used??
BR
Sergei
--
Sergei Petrunia, Software Developer
Monty Program AB, http://askmonty.org
Blog: http://s.petrunia.net/blog