maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #13290
MDEV-30218: On what should be passed as 'rows' to filtering code.
Hello Igor,
So I'm looking at the bb-10.4-release-igor tree with your latest patch:
commit c4fa6c3c4eaacefd9bf50b5c88c2c72474a15bc5 (HEAD -> bb-10.4-release-igor, origin/bb-10.4-release-igor)
Author: Igor Babaev <igor@xxxxxxxxxxxx>
Date: Tue Jan 31 13:14:53 2023 -0800
MDEV-30218 Incorrect optimization for rowid_filtering
Correction over the last patch for this MDEV.
I modify the code to add a printout:
diff --git a/sql/sql_select.cc b/sql/sql_select.cc
index 4bdfb659513..2cd4495bc03 100644
--- a/sql/sql_select.cc
+++ b/sql/sql_select.cc
@@ -7965,6 +7965,8 @@ best_access_path(JOIN *join,
table->best_range_rowid_filter_for_partial_join(start_key->key,
rows,
access_cost_factor);
+ trace_access_idx.add("ROWS_FOR_FILTER", rows);
+ trace_access_idx.add("REAL_ROWS_FOR_FILTER", tmp);
if (filter)
{
tmp-= filter->get_adjusted_gain(rows) - filter->get_cmp_gain(rows);
then, I'm trying this example:
create table t1 (a int, b int);
insert into t1 select seq, seq from seq_1_to_10000;
create table t2 (a int, key(a));
insert into t2 select seq from seq_1_to_10000;
insert into t2 select * from t2;
insert into t2 select * from t2;
analyze table t2 persistent for all;
create table t3 (
pk int primary key,
a int,
b int,
unique key(a),
key(b)
);
insert into t3 select
seq, seq, seq from seq_1_to_100000;
set optimizer_trace=1;
explain
select straight_join * from t1, t2, t3
where
t2.a=t1.a and
t3.a=t1.b and
t3.b < 5000;
+------+-------------+-------+---------------+---------------+------+---------+---------+--------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+---------------+---------------+------+---------+---------+--------+---------------------------------+
| 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 10157 | Using where |
| 1 | SIMPLE | t2 | ref | a | a | 5 | j1.t1.a | 4 | Using index |
| 1 | SIMPLE | t3 | eq_ref|filter | a,b | a|b | 5|5 | j1.t1.b | 1 (5%) | Using where; Using rowid filter |
+------+-------------+-------+---------------+---------------+------+---------+---------+--------+---------------------------------+
good so far.
select * from information_schema.optimizer_trace;
shows this (note the added printouts in caps):
{
"plan_prefix": ["t1", "t2"],
"table": "t3",
"best_access_path": {
"considered_access_paths": [
{
"access_type": "eq_ref",
"index": "a",
"ROWS_FOR_FILTER": 40628,
"REAL_ROWS_FOR_FILTER": 10157,
"rowid_filter_key": "b",
"rows": 1,
"cost": 9498,
"chosen": true
},
Let's run the query and check what really happens:
ANALYZE FORMAT=JSON
select straight_join * from t1, t2, t3
where
t2.a=t1.a and
t3.a=t1.b and
t3.b < 5000;
shows:
...
"table": {
"table_name": "t3",
"access_type": "eq_ref",
"key": "a",
...
"rowid_filter": {
...
"r_lookups": 10000,
...
},
"r_loops": 40000,
...
So,
ROWS_FOR_FILTER=40K
REAL_ROWS_FOR_FILTER=10K
which of the numbers is a closer match here for the observed value of
r_lookups= 10K
?
BR
Sergei
--
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://petrunia.net