maria-developers team mailing list archive

Thread
Date
Re: [Commits] Rev 2802: few small MySQL bugs/issues that impact the engines, as discussed in the SE summit in http://bazaar.launchpad.net/~maria-captains/maria/5.2/

To: monty@xxxxxxxxxxxx
From: Sergei Golubchik <serg@xxxxxxxxxxxx>
Date: Thu, 3 Jun 2010 10:31:23 +0200
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx
In-reply-to: <19462.65528.831297.789598@gargle.gargle.HOWL>
User-agent: Mutt/1.5.16 (2007-06-09)
Hi, Monty!

Thanks for the review!
See my replies below.

On Jun 03, Michael Widenius wrote:
> 
> > At http://bazaar.launchpad.net/~maria-captains/maria/5.2/
> > ------------------------------------------------------------
> > revno: 2802
> 
> >   few small MySQL bugs/issues that impact the engines, as discussed in the SE summit
> >   * remove handler::index_read_last()
> >   * create handler::keyread_read_time() (was get_index_only_read_time() in opt_range.cc)
> >   * ha_show_status() allows engine's show_status() to fail
> >   * remove HTON_FLUSH_AFTER_RENAME
> >   * fix key_cmp_if_same() to work for floats and doubles
> >   * set table->status in the server, don't force engines to do it
> >   * increment status vars in the server, don't force engines to do it
> 
> > +++ b/mysql-test/r/status_user.result	2010-06-01 22:39:29 +0000
> > @@ -100,8 +100,8 @@ Handler_commit	19
> >  Handler_delete  1
> >  Handler_discover        0
> >  Handler_prepare 18
> > -Handler_read_first      1
> > -Handler_read_key        8
> > +Handler_read_first      0
> > +Handler_read_key        3
> 
> Any explanation why this change happened (as the test didn't change
> and I can't understand how the values could suddently be less now).

This change is correct. Before my commit, calls were counted in the
handler, say, in the index_first() and index_next().

And ha_innobase::rnd_next() is implemented by calling
index_first/index_next.

So, innodb was incrementing Handler_read_first and Handler_read_next for
table scans (and of course it was incrementing Handler_read_rnd_next too
- double counting).

It was wrong - first, it was double counting. Second, Handler_*
should count handler calls as done by mysql, not expose internal
implementation of the engine. For example, mi_rfirst() calls mi_rnext()
internally, but we don't count it as Handler_read_next. The same should
be true for any engine, even if it mixes implementation levels.

> By the way, it would be nice if the file comments would be part of the
> commit email (as I assume you documented this issue there).

I have not :(
But I will, when I recommit.
 
> > +++ b/mysql-test/r/partition_pruning.result	2010-06-01 22:39:29 +0000
> > @@ -2373,7 +2373,7 @@ flush status;
> >  update t1 set a=100 where a+1=5+1;
> >  show status like 'Handler_read_rnd_next';
> >  Variable_name   Value
> > -Handler_read_rnd_next   10
> > +Handler_read_rnd_next   19
> 
> Any explanation why this change happened (as the test didn't change)
> Is it because we don't anymore count rows read in 'show' commands?

This is questionable change, I wanted to discuss it.

ha_partition::index_next (for example) calls underlying engine's
file->ha_index_next(), not file->index_next().

After my change Handler_read_key_next is incremented for both
ha_partition::index_next and file->index_next(). Double counting.

Before my change when a partition was pruned, Handler_read_key* counters
were not incremented at all (as ha_partition::index_read did not call
file->index_read() at all). Now it is incremented - that's why the
numbers are increased.

Possible solutions:

* do not increment Handler_read* counters for ha_partition methods,
only count calls to the underlying engines.
* do not increment Handler_read* counters for underlying engines - only
count calls from the upper layer into the handler, this is logical but
counters won't show partition pruning or handler call overhead caused by
many partitions. this can be solved by adding special set of counters
Handler_partition_read_* (or something).

> > === modified file 'sql/handler.cc'
> > --- a/sql/handler.cc	2010-06-01 19:52:20 +0000
> > +++ b/sql/handler.cc	2010-06-01 22:39:29 +0000
> > @@ -2131,8 +2125,6 @@ int handler::read_first_row(uchar * buf,
> >    register int error;
> >    DBUG_ENTER("handler::read_first_row");
> >  
> > -  ha_statistic_increment(&SSV::ha_read_first_count);
> 
> The above is wrong; We are later calling 'index_first()' in this
> function, not ha_index_first(), so we miss one increment (which was
> shown in the test cases).  Note that we do also call rnd_next() in
> this function, without any counting of rows so we need to fix other
> things in this function too!

That's fine, the counter is incremented in ha_read_first_row() wrapper.
If anything, the old code was wrong as it was incrementing
ha_read_first_count twice (once here and once in index_first).
 
> Simplest solution is to change to call ha_index_first / ha_rnd_next()
> in this function. This will also fix the 'table->status' variable that
> your are not counting anymore.
> This should be ok as we very seldom use 'handler::read_first_row()'

see above. I think it's ok to just increment ha_read_first_count in the
wrapper. Especially because read_first_row() is rarely used.
 
> Note that we should do same change in other functions that are calling
> handler functions directly:
> 
> handler::read_range_first
>   - This calls index_first() and index_read_map()
> get_auto_increment()
>   - This calls index_last() and index_read_map()
> index_read_idx_map()
>   - This calls index_read_map()
>   - Note that we can't trivially change this to call ha_index_read_map()
>     as we increment things statistics in ha_index_read_idx_map()
>   - We need to update table->status in this function!

Yes. But read_range_first() for example has no dedicated counter, so
either it increments Handler_read_key* counters in the default
implementation, or it increments nothing at all when any engine provides
its own implementation :(
 
> > +/*
> > +  Calculate cost of 'index only' scan for given index and number of records.
> > +
> > +  SYNOPSIS
> > +  handler->keyread_read_time()
> > +      param    parameters structure
> > +      records  #of records to read
> > +      keynr    key to read
> > +
> > +  NOTES
> > +    It is assumed that we will read trough the whole key range and that all
> > +    key blocks are half full (normally things are much better). It is also
> > +    assumed that each time we read the next key from the index, the handler
> > +    performs a random seek, thus the cost is proportional to the number of
> > +    blocks read.
> > +*/
> > +
> > +double handler::keyread_read_time(uint index, uint ranges, ha_rows rows)
> > +{
> > +  double read_time;
> > +  uint keys_per_block= (stats.block_size/2/
> > +                        (table->key_info[index].key_length + ref_length) + 1);
> > +  read_time=((double) (rows+keys_per_block-1)/ (double) keys_per_block);
> > +  return read_time;
> > +}
> 
> Do we really need the 'ranges' argument ?
> (It's always '1' in the current code and you are not using it)

I don't know :)

I've copied it from the handler::read_time(), just to have the
interface the same for consistency. After all - logically - if the
read_time() may depend on the number of ranges, keyread_read_time()
certainly can do too.
 
> > === modified file 'sql/key.cc'
> > --- a/sql/key.cc	2008-10-10 10:01:01 +0000
> > +++ b/sql/key.cc	2010-06-01 22:39:29 +0000
> > @@ -278,8 +278,10 @@ bool key_cmp_if_same(TABLE *table,const 
> >        key++;
> >        store_length--;
> >      }
> > -    if (key_part->key_part_flag & (HA_BLOB_PART | HA_VAR_LENGTH_PART |
> > -                                   HA_BIT_PART))
> > +    if ((key_part->key_part_flag & (HA_BLOB_PART | HA_VAR_LENGTH_PART |
> > +                                   HA_BIT_PART)) ||
> > +         key_part->type == HA_KEYTYPE_FLOAT ||
> > +         key_part->type == HA_KEYTYPE_DOUBLE)
> >      {
> >        if (key_part->field->key_cmp(key, key_part->length))
> >          return 1;
> 
> I understand that for float and double there is some extraordinary
> cases where memcmp() is not same as =, but who has had a problem with
> this?

there was a bug report in mysql bugdb.
http://bugs.mysql.com/bug.php?id=44372
 
> As a separate note, I think it would be better to add to key_part_flag
> HA_NO_CMP_WITH_MEMCMP for key_parts of type FLOAT or DOUBLE
> when we open the table. This would simplify this test a bit.

I'll try to
 
> > === modified file 'sql/table.h'
> > --- a/sql/table.h	2010-06-01 19:52:20 +0000
> > +++ b/sql/table.h	2010-06-01 22:39:29 +0000
> > @@ -13,6 +13,8 @@
> >     along with this program; if not, write to the Free Software
> >     Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA */
> >  
> > +#ifndef SQL_TABLE_INCLUDED
> > +#define SQL_TABLE_INCLUDED
> 
> Do we really need this one as it's automaticly included by mysql_priv.h ?
> Anyway, it should probably be MYSQL_TABLE_H to be similar our other defines.

This is actually unrelated change, I tried to include table.h to handler.h
(to solve the problem of inline handler methods needing TABLE) and had
to add include guards. later I solved the problem differently but kept
the guards as they're a good thing anyway.

As for the name of the guard, it's new (~1 yr old) MySQL style. As I
personally don't care about the name of the guards, as long as they all
use a consistent style, I use the MySQL naming style here.

> > === modified file 'storage/myisam/ha_myisam.cc'
> 
> <cut>
> 
> >  int ha_myisam::index_next(uchar *buf)
> >  {
> >    DBUG_ASSERT(inited==INDEX);
> > -  ha_statistic_increment(&SSV::ha_read_next_count);
> >    int error=mi_rnext(file,buf,active_index);
> >    table->status=error ? STATUS_NOT_FOUND: 0;
> 
> you should probably remove the setting of table->status here

Neither updating table->status not ha_statistic_increment() can
hurt here, and as you have seen I've not updated any other engine at
all. I've only did it in MyISAM to check that the change works, the code
compiles, test results don't change, and so on.

But I'll remove table->status updates from MyISAM.

> The whole function can the be changed to:
> 
>   return mi_rnext(file,buf,active_index);
> 
> Same goes for all other instances of setting table->status in this file
> 
Regards,
Sergei