← Back to team overview

maria-developers team mailing list archive

MDEV-30218: On what should be passed as 'rows' to filtering code.

 

Hello Igor,

So I'm looking at the bb-10.4-release-igor tree with your latest patch:

commit c4fa6c3c4eaacefd9bf50b5c88c2c72474a15bc5 (HEAD -> bb-10.4-release-igor, origin/bb-10.4-release-igor)
Author: Igor Babaev <igor@xxxxxxxxxxxx>
Date:   Tue Jan 31 13:14:53 2023 -0800

    MDEV-30218 Incorrect optimization for rowid_filtering
    
    Correction over the last patch for this MDEV.

I modify the code to add a printout:

diff --git a/sql/sql_select.cc b/sql/sql_select.cc
index 4bdfb659513..2cd4495bc03 100644
--- a/sql/sql_select.cc
+++ b/sql/sql_select.cc
@@ -7965,6 +7965,8 @@ best_access_path(JOIN      *join,
           table->best_range_rowid_filter_for_partial_join(start_key->key,
                                                           rows,
                                                           access_cost_factor);
+        trace_access_idx.add("ROWS_FOR_FILTER", rows);
+        trace_access_idx.add("REAL_ROWS_FOR_FILTER", tmp);
         if (filter)
         {
           tmp-= filter->get_adjusted_gain(rows) - filter->get_cmp_gain(rows);

then, I'm trying this example:

create table t1 (a int, b int);
insert into t1 select seq, seq from seq_1_to_10000;

create table t2 (a int, key(a));
insert into t2 select seq from seq_1_to_10000;
insert into t2 select * from t2;
insert into t2 select * from t2;
analyze table t2 persistent for all;

create table t3 (
  pk int primary key,
  a int,
  b int,
  unique key(a),
  key(b)
);

insert into t3 select
  seq, seq, seq from seq_1_to_100000;

set optimizer_trace=1;
explain
select straight_join * from t1, t2, t3
where 
 t2.a=t1.a and
 t3.a=t1.b and
 t3.b < 5000;

+------+-------------+-------+---------------+---------------+------+---------+---------+--------+---------------------------------+
| id   | select_type | table | type          | possible_keys | key  | key_len | ref     | rows   | Extra                           |
+------+-------------+-------+---------------+---------------+------+---------+---------+--------+---------------------------------+
|    1 | SIMPLE      | t1    | ALL           | NULL          | NULL | NULL    | NULL    | 10157  | Using where                     |
|    1 | SIMPLE      | t2    | ref           | a             | a    | 5       | j1.t1.a | 4      | Using index                     |
|    1 | SIMPLE      | t3    | eq_ref|filter | a,b           | a|b  | 5|5     | j1.t1.b | 1 (5%) | Using where; Using rowid filter |
+------+-------------+-------+---------------+---------------+------+---------+---------+--------+---------------------------------+

good so far.

select * from information_schema.optimizer_trace;

shows this (note the added printouts in caps):

      {
        "plan_prefix": ["t1", "t2"],
        "table": "t3",
        "best_access_path": {
          "considered_access_paths": [
            {
              "access_type": "eq_ref",
              "index": "a",
              "ROWS_FOR_FILTER": 40628,
              "REAL_ROWS_FOR_FILTER": 10157,
              "rowid_filter_key": "b",
              "rows": 1,
              "cost": 9498,
              "chosen": true
            },

Let's run the query and check what really happens:

ANALYZE FORMAT=JSON
select straight_join * from t1, t2, t3
where 
 t2.a=t1.a and
 t3.a=t1.b and
 t3.b < 5000;

shows:
...
    "table": {
      "table_name": "t3",
      "access_type": "eq_ref",
      "key": "a",
      ...
      "rowid_filter": {
        ...
        "r_lookups": 10000,
        ...
      },
      "r_loops": 40000,
      ...

So, 
  ROWS_FOR_FILTER=40K 
  REAL_ROWS_FOR_FILTER=10K

which of the numbers is a closer match here for the observed value of 
  r_lookups= 10K

?
BR
 Sergei
-- 
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://petrunia.net