maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #04013
Re: MWL#116: Efficient group commit, ready for code review
Hi Monty,
Thanks for your review!
I generally followed your suggestions, except for this:
>> + if (entry->begin_event->write(&log_file))
>> + return ER_ERROR_ON_WRITE;
>> +
>> + DBUG_EXECUTE_IF("crash_before_writing_xid",
>> + {
>> + if ((write_cache(cache)))
>> + DBUG_PRINT("info", ("error writing binlog cache"));
>> + else
>> + flush_and_sync();
>> +
>> + DBUG_PRINT("info", ("crashing before writing xid"));
>> + abort();
>> + });
>> +
>> + if (write_cache(cache))
>> + return ER_ERROR_ON_WRITE;
>> +
>> + if (entry->end_event->write(&log_file))
>> + return ER_ERROR_ON_WRITE;
>> +
>> + if (entry->incident_event && entry->incident_event->write(&log_file))
>> + return ER_ERROR_ON_WRITE;
>> +
>> + if (cache->error) // Error on read
>> + return ER_ERROR_ON_READ;
>
> It would be better to test this after 'write_cache'.
> There is no point in writing anymore if this fails....
Are you sure? Couldn't this result in a half-written event group (missing
COMMIT or incident event) in cases where it wouldn't happen before? Anyway,
this is unchanged from the existing code, I am reluctant to change it as part
of the group commit patch ...
----
Otherwise I fixed everything you pointed out. Below a few clarifying comments
in case you are interested.
>> + trx_data->using_xa= TRUE;
>> + if (xid)
>
> The above looks strange.
> How can we have xa without an xid ?
After some investigation, I found this can happen in XA COMMIT ... ONE PHASE.
I added a comment. It was actually a bug, in this case of XA COMMIT ... ONE
PHASE it should use a COMMIT query event (not NULL). Fixed; this also removes
the case of end_event==NULL that you commented on in other parts of the code.
>> /*
>> + We only bother to write to the binary log if there is anything
>> + to write.
>> */
>> + if (my_b_tell(cache) > 0)
>> + {
>
> Can't we ignore transactions with cache == 0 on the upper level ?
> There is no reason I can come up with why we put these in the queue.
It can happen for example for this:
SET sql_log_bin = 0;
BEGIN;
INSERT INTO innodb_table VALUES (3);
INSERT INTO pbxt_table VALUES (4);
COMMIT;
In this case, there is nothing to write to the binlog. However, the code is
still invoked, as binlog needs to act as a transaction coordinator for 2-phase
commit between the two storage engines.
(This is btw. unchanged in the patch compared to the original code).
> The above is better written as:
>
> status_var_increment(thd->status_var.ha_prepare_count);
> if ((err= ht->prepare(ht, thd, all)))
> {
> my_error(ER_ERROR_DURING_COMMIT, MYF(0), err);
> goto err;
> }
I did it like this, to avoid swapping status_var_increment() and ht->prepare():
err= ht->prepare(ht, thd, all)
status_var_increment(thd->status_var.ha_prepare_count);
if (err)
{
my_error(ER_ERROR_DURING_COMMIT, MYF(0), err);
goto err;
}
>> +# Check that the binlog position reported by InnoDB is the correct one
>> +# for the end of the second transaction (as can be checked with
>> +# mysqlbinlog).
>> +let $MYSQLD_DATADIR= `SELECT @@datadir`;
>> +--exec grep 'InnoDB: Last MySQL binlog file position' $MYSQLD_DATADIR/../../log/mysqld.1.err | tail -1
>
> Is there another way to get the position without using grep ?
> Another option would be to store the startup position in some user
> variables (assuming this would be useful for anyone else).
Yes, I think this is a good idea. I added a worklog for it: MWL#191.
> (as grep probably don't work on windows)
>
> As we already have regexp in mysqltest, it should be trivial to do our
> own simple version of grep there.
> Let's see if we can get someone to implement the grep.
Actually there should be several ways to get `grep` and `tail` on Windows,
including at least one made by Microsoft and several that are Free Software.
I would personally prefer to not re-invent the wheel, but rely on existing
packages for this. We could have a `--source include/have_posix_tools' so we
do not fail the test if the posix commands are not available.
Anyway, in this case, the MWL#191 is the best solution I think. I added to the
worklog to fix the test to not use grep, and until then I will just put
`--source include/not_windows.inc' with a reference to MWL#191.
>> +# Actually the output from this currently shows a bug.
>> +# The injected IO error leaves partially written transactions in the binlog in
>> +# the form of stray "BEGIN" events.
>> +# These should disappear from the output if binlog error handling is improved.
>
> Do you have a worklog number for the above. If yes, please add it to
> the source too.
Done (it is MySQL Bug#37148 and WL#1790).
>> +int
>> +MYSQL_BIN_LOG::write_transaction(group_commit_entry *entry)
>> +{
>> + binlog_trx_data *trx_data= entry->trx_data;
>> + IO_CACHE *cache= &trx_data->trans_log;
>> + /*
>> + Log "BEGIN" at the beginning of every transaction. Here, a transaction is
>> + either a BEGIN..COMMIT block or a single statement in autocommit mode. The
>> + event was constructed in write_transaction_to_binlog(), in the thread
>> + running the transaction.
>> +
>> + Now this Query_log_event has artificial log_pos 0. It must be
>> + adjusted to reflect the real position in the log. Not doing it
>> + would confuse the slave: it would prevent this one from
>> + knowing where he is in the master's binlog, which would result
>> + in wrong positions being shown to the user, MASTER_POS_WAIT
>> + undue waiting etc.
>> + */
>
> The above old comment doesn't make sense anymore in this place of the
> code. Please remove or move it to a better place.
> (I tried to find a better place, but couldn't find it)
Moved the first part to write_transaction_to_binlog(); the last part I deleted
as there is a similar better comment in write_cache().
- Kristian.
References