maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #04838
Cost analysis: check if optimizer is able to produce a correct E(#scalar_subquery executions)
Let's see if the optimizer is theoretically able to attach the subquery to the
right table for the FirstMatch plan:
On Thu, Aug 09, 2012 at 12:39:00PM +0400, Sergei Petrunia wrote:
> == Modified query ==
...
>
> == Benchmarking ==
>
> Number of times scalar-subquery was executed:
> Materialization: 9552 times (same as before)
> First-Match: 299 times (A LOT LESS!)
>
> Execution times:
> Materialization: 5.30 sec (same as before)
> First-Match: 1.72 sec (a lot less, used to be 15 sec! Now it beats
> materialization)
>
Let's check whether the optimizer has sufficient input data to conclude that
moving the scalar-subquery from table partsupp to table part is advantageous.
The data about extra selectivity is basically tab->quick_condition_rows.
Let's take the EXPLAIN and its analysis from the "Cost analysis: optimizer
statistics vs real dataset properties" email. Relevant lines will start with
'NOTE>'.
+--+-------------+--------+------++-------------------+-------+---------------------------------------+----+--------+---------------------------------------------------------+
|id|select_type |table |type ||key |key_len|ref |rows|filtered|Extra |
+--+-------------+--------+------++-------------------+-------+---------------------------------------+----+--------+---------------------------------------------------------+
| 1|PRIMARY |nation |ref ||n_name |26 |const | 1| 100.00|Using where; Using index; Using temporary; Using filesort|
| 1|PRIMARY |supplier|ref ||i_s_nationkey |5 |nation.n_nationkey | 251| 100.00| |
| 1|PRIMARY |partsupp|ref ||i_ps_suppkey |4 |supplier.s_suppkey | 34| 100.00|Using where |
| 1|PRIMARY |part |eq_ref||PRIMARY |4 |partsupp.ps_partkey | 1| 100.00|Using where; FirstMatch(supplier) |
| 4|DEP. SUBQUERY|lineitem|ref ||i_l_suppkey_partkey|10 |partsupp.ps_partkey,partsupp.ps_suppkey| 3| 100.00|Using where |
+--+-------------+--------+------++-------------------+-------+---------------------------------------+----+--------+---------------------------------------------------------+
5 rows in set, 3 warnings (0.01 sec)
=== nation: ===
like above, 1 row. OK.
NOTE> - "using where" won't filter anything out.
=== supplier: ===
see above, 251 rows, ~ok 412 real...
NOTE> - "using where" won't filter anything out.
=== partsupp: ===
- 800K rows and 10K distinct ps_suppkey, which gives rec_per_key=80 (EXPLAIN shows 34)
- as for data that we will hit:
select count(*) from nation, supplier, partsupp
where s_suppkey=ps_suppkey and s_nationkey=n_nationkey and n_name='canada';
gives 32690 rows. 32690 / 412 = 80, matches rec_per_key.
NOTE> "using where" won't filter anything out (NOT TAKING the scalar-subquery into account)
=== part: ===
- eq_ref, so exactly 1 match. We know DBT-3 dataset is such that it always has one.
NOTE> "using where" is "p_name like forest"! it will filter stuff out!
I haven't checked, but I suppose that condition "p_name like forest%'(*) is not
correlated with the table access condition, 'p_partkey=partsupp_ps_partkey'.
The email 'Cost analysis: Materialization plan' shows a range access on (*),
with the estimate rows=2378.
Total number of rows in table part is is 200K, InnoDB's estimate is 200999.
This gives selectivity of 0.0118.
=== How many times scalar-subquery will be evaluated ===
Let's take estimate numbers:
1 * 251 * 34 * 1 * 0.0118 = 100.7012 evaluations.
If the estimates were perfect:
1 * 400 * 80 * 1 * 0.118 = 377.6 evaluations
As mentioned at the top of this email, in reality subquery is evaluated 299
times.
=== Conclusion#1 ===
quick_condition estimates provide information that will allow the optimizer to
make a correct conclusion about # of times that scalar-subquery will be
evaluated after table part.
What remains to be checked: check that our cost formulas will make the plan of
{ FirstMatch, scalar-subquery attached to table part}
cheaper than
{ FirstMatch, scalar-subquery attached to table partsupp}
and cheaper than the SJ-Materialization plan.
BR
Sergei
--
Sergei Petrunia, Software Developer
Monty Program AB, http://askmonty.org
Blog: http://s.petrunia.net/blog