← Back to team overview

maria-developers team mailing list archive

Re: MWL#116: Efficient group commit, ready for code review


Hi Monty,

Thanks for your review!

I generally followed your suggestions, except for this:

>> +  if (entry->begin_event->write(&log_file))
>> +    return ER_ERROR_ON_WRITE;
>> +
>> +  DBUG_EXECUTE_IF("crash_before_writing_xid",
>> +                  {
>> +                    if ((write_cache(cache)))
>> +                      DBUG_PRINT("info", ("error writing binlog cache"));
>> +                    else
>> +                      flush_and_sync();
>> +
>> +                    DBUG_PRINT("info", ("crashing before writing xid"));
>> +                    abort();
>> +                  });
>> +
>> +  if (write_cache(cache))
>> +    return ER_ERROR_ON_WRITE;
>> +
>> +  if (entry->end_event->write(&log_file))
>> +    return ER_ERROR_ON_WRITE;
>> +
>> +  if (entry->incident_event && entry->incident_event->write(&log_file))
>> +    return ER_ERROR_ON_WRITE;
>> +
>> +  if (cache->error)				// Error on read
>> +    return ER_ERROR_ON_READ;
> It would be better to test this after 'write_cache'.
> There is no point in writing anymore if this fails....

Are you sure? Couldn't this result in a half-written event group (missing
COMMIT or incident event) in cases where it wouldn't happen before? Anyway,
this is unchanged from the existing code, I am reluctant to change it as part
of the group commit patch ...


Otherwise I fixed everything you pointed out. Below a few clarifying comments
in case you are interested.

>> +  trx_data->using_xa= TRUE;
>> +  if (xid)
> The above looks strange.
> How can we have xa without an xid ?

After some investigation, I found this can happen in XA COMMIT ... ONE PHASE.
I added a comment. It was actually a bug, in this case of XA COMMIT ... ONE
PHASE it should use a COMMIT query event (not NULL). Fixed; this also removes
the case of end_event==NULL that you commented on in other parts of the code.

>>        /*
>> +        We only bother to write to the binary log if there is anything
>> +        to write.
>>        */
>> +      if (my_b_tell(cache) > 0)
>> +      {
> Can't we ignore transactions with cache == 0 on the upper level ?
> There is no reason I can come up with why we put these in the queue.

It can happen for example for this:

    SET sql_log_bin = 0;
    INSERT INTO innodb_table VALUES (3);
    INSERT INTO pbxt_table VALUES (4);

In this case, there is nothing to write to the binlog. However, the code is
still invoked, as binlog needs to act as a transaction coordinator for 2-phase
commit between the two storage engines.

(This is btw. unchanged in the patch compared to the original code).

> The above is better written as:
>   status_var_increment(thd->status_var.ha_prepare_count);
>   if ((err= ht->prepare(ht, thd, all)))
>   {
>     my_error(ER_ERROR_DURING_COMMIT, MYF(0), err);
>     goto err;
>   }

I did it like this, to avoid swapping status_var_increment() and ht->prepare():

  err= ht->prepare(ht, thd, all)
  if (err)
    my_error(ER_ERROR_DURING_COMMIT, MYF(0), err);
    goto err;

>> +# Check that the binlog position reported by InnoDB is the correct one
>> +# for the end of the second transaction (as can be checked with
>> +# mysqlbinlog).
>> +let $MYSQLD_DATADIR= `SELECT @@datadir`;
>> +--exec grep 'InnoDB: Last MySQL binlog file position' $MYSQLD_DATADIR/../../log/mysqld.1.err | tail -1
> Is there another way to get the position without using grep ?

> Another option would be to store the startup position in some user
> variables (assuming this would be useful for anyone else).

Yes, I think this is a good idea. I added a worklog for it: MWL#191.

> (as grep probably don't work on windows)
> As we already have regexp in mysqltest, it should be trivial to do our
> own simple version of grep there.
> Let's see if we can get someone to implement the grep.

Actually there should be several ways to get `grep` and `tail` on Windows,
including at least one made by Microsoft and several that are Free Software.

I would personally prefer to not re-invent the wheel, but rely on existing
packages for this. We could have a `--source include/have_posix_tools' so we
do not fail the test if the posix commands are not available.

Anyway, in this case, the MWL#191 is the best solution I think. I added to the
worklog to fix the test to not use grep, and until then I will just put
`--source include/not_windows.inc' with a reference to MWL#191.

>> +# Actually the output from this currently shows a bug.
>> +# The injected IO error leaves partially written transactions in the binlog in
>> +# the form of stray "BEGIN" events.
>> +# These should disappear from the output if binlog error handling is improved.
> Do you have a worklog number for the above. If yes, please add it to
> the source too.

Done (it is MySQL Bug#37148 and WL#1790).

>> +int
>> +MYSQL_BIN_LOG::write_transaction(group_commit_entry *entry)
>> +{
>> +  binlog_trx_data *trx_data= entry->trx_data;
>> +  IO_CACHE *cache= &trx_data->trans_log;
>> +  /*
>> +    Log "BEGIN" at the beginning of every transaction.  Here, a transaction is
>> +    either a BEGIN..COMMIT block or a single statement in autocommit mode. The
>> +    event was constructed in write_transaction_to_binlog(), in the thread
>> +    running the transaction.
>> +
>> +    Now this Query_log_event has artificial log_pos 0. It must be
>> +    adjusted to reflect the real position in the log. Not doing it
>> +    would confuse the slave: it would prevent this one from
>> +    knowing where he is in the master's binlog, which would result
>> +    in wrong positions being shown to the user, MASTER_POS_WAIT
>> +    undue waiting etc.
>> +  */
> The above old comment doesn't make sense anymore in this place of the
> code. Please remove or move it to a better place.
> (I tried to find a better place, but couldn't find it)

Moved the first part to write_transaction_to_binlog(); the last part I deleted
as there is a similar better comment in write_cache().

 - Kristian.