← Back to team overview

maria-developers team mailing list archive

Re: Review request for: [Commits] Rev 3403: Fix for bug lp:944706, task MDEV-193 in file:///home/tsk/mprog/src/5.5-lpb944706/

 

On Mon, May 14, 2012 at 11:22:52PM +0300, Timour Katchaounov wrote:
> > >  /**
> > > +  Estimate the number of rows that query execution will read.
> > > +
> > > +  @todo This is a very pessimistic upper bound. Use join selectivity
> > > +  when available to produce a more realistic number.
> > > +*/
> > > +
> > > +double JOIN::get_examined_rows()
> > > +{
> > > +  /* Each constant table examines one row, and the result is at most one row. */
> > > +  ha_rows examined_rows= const_tables;
> > > +  uint i= const_tables;
> > > +  double prev_fanout;
> > > +
> > > +  if (table_count == const_tables)
> > > +    return examined_rows;
> > > +
> > > +  examined_rows+= join_tab[i++].get_examined_rows();
> > > +  for (; i < table_count ; i++)
> > > +  {
> > > +    if (join_tab[i].type == JT_EQ_REF)
> > > +      prev_fanout= 1;
> > > +    else
> > > +      prev_fanout= best_positions[i-1].records_read;
>
> This looks wrong. Declaration of POSITION::records_read has this comment:
>
>   /*
>     The "fanout": number of output rows that will be produced (after
>     pushed down selection condition is applied) per each row combination of
>     previous tables.
>   */
>
> note the "PER EACH ROW COMBINATION .." part.  I would expect that this function
> would calculate a product of records_read values.
>
> timour:
>
> In this function we want to estimate how many rows will be *examined*, not
> produced by each JOIN operator. For the estimate I assume a simple nested loops
> model, where the JOIN read every row of its right table as many times as many
> rows there are in the left operand. The partial join that serves as left
> operand, contains records_read rows. This is the multiplier of the number of
> rows that will be examined in the right table.
>
> Of course, for each partial join the join condition will filter a subset of
> these rows.
>
> I agree that blocking algorithms may examine a lot less rows, but it doesn't
> make sense to have a very tight bound here. We use an upper bound, because it
> is better to miss some constant optimizations, rather than execute very
> expensive subqueries here.

I was talking about something much more simpler than that. Consider this
example:

create table ten (a int);
insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);

create table one_k (a int);
insert into one_k select A.a + 10*B.a + 100*C.a from ten A, ten B, ten C;

create table ti1 ( a int);
insert into ti1 select a from ten limit 4;

alter table ti1 add b int;

create table ti2 (a int primary key, b int);
create table ti3 (a int primary key, b int);

insert into ti2 select a, a from one_k;
insert into ti3 select a, a from one_k;

MariaDB [test]> explain select * from ten where 3 in (select ti2.b + ti3.b from ti1, ti2, ti3 where ti2.a=ti1.a and ti3.a=ti1.a) or a <5;
+------+--------------------+-------+--------+---------------+---------+---------+------------+------+-------------+
| id   | select_type        | table | type   | possible_keys | key     | key_len | ref        | rows | Extra       |
+------+--------------------+-------+--------+---------------+---------+---------+------------+------+-------------+
|    1 | PRIMARY            | ten   | ALL    | NULL          | NULL    | NULL    | NULL       |   10 | Using where |
|    2 | DEPENDENT SUBQUERY | ti1   | ALL    | NULL          | NULL    | NULL    | NULL       |    4 | Using where |
|    2 | DEPENDENT SUBQUERY | ti3   | eq_ref | PRIMARY       | PRIMARY | 4       | test.ti1.a |    1 |             |
|    2 | DEPENDENT SUBQUERY | ti2   | eq_ref | PRIMARY       | PRIMARY | 4       | test.ti1.a |    1 | Using where |
+------+--------------------+-------+--------+---------------+---------+---------+------------+------+-------------+

For this example, JOIN::get_examined_rows() produces 6. The number comes from

4 rows examined in table ti1 (correct)
1 row examined in table ti2 (incorrect, should be 4)
1 row examined in table ti3 (incorrect, should be 4)

Do you agree that the number of 6 is incorrect and needs to be fixed?

BR
 Sergei
-- 
Sergei Petrunia, Software Developer
Monty Program AB, http://askmonty.org
Blog: http://s.petrunia.net/blog


References