← Back to team overview

maria-developers team mailing list archive

Re: Galera-10.0 still does not work with rollbacks

 

Ok, so I analysed things and found something.

The direct cause of the assert is that reinit_io_cache() fails, as seen in the
logs:

reinit_io_cache: enter: cache: 0x3cee63b0 type: 2  seek_offset: 0  clear_cache: 0
my_b_flush_io_cache: enter: cache: 0x3cee63b0
my_seek: my: fd: 58 Pos: 11936128518282651045  Whence: 0  MyFlags: 16
my_error: my: nr: 33  MyFlags: 0  errno: 22
my_message_sql: error: error: 33  message: 'Can't seek in file '/home/jan/mysql/galera-test/node0/tmp/ML7E3rnl' (Errcode: 22 "Invalid argument")'  Flag: 0
my_seek: error: lseek: 18446744073709551615  errno: 22

Error handling is missing when the binlog code calls reinit_io_cache(), so the
error is ignored, and it proceeds to trigger the assertion in reset() as the
reinit_io_cache() did not complete.

The invalid seek offset is 0xa5a5a5a5a5a5a5a5, uninitialised memory.

The root cause of this seems to occur much earlier. The bad offset appears a
number of times previous in the log, until a ROLLBACK statement happens to
trigger the assertion.

It appears first at the execution of a ROLLBACK TO SAVEPOINT A:

dispatch_command: query: ROLLBACK TO SAVEPOINT A

reset_current_stmt_binlog_format_row: debug: temporary_tables: no, in_sub_stmt: no, system_thread: NON_SYSTEM_THREAD
mysql_reset_thd_for_next_command: debug: is_current_stmt_binlog_format_row(): 1
alloc_root: enter: root: 0x3d064100
alloc_root: exit: ptr: 0x3d2870d8
binlog_trans_log_truncate: enter: pos: 11936128518282651045
binlog_trans_log_truncate: info: truncating to position 11936128518282651045
reinit_io_cache: enter: cache: 0x3cee63b0 type: 2  seek_offset: 11936128518282651045  clear_cache: 0

The wrong offset comes into the binlog code from the upper layer in this
handlerton method:

static int binlog_savepoint_rollback(handlerton *hton, THD *thd, void *sv)
  ...
  binlog_trans_log_truncate(thd, *(my_off_t*)sv);

The parameter void *sv is supposed to have been initialised in a prior call to
binlog_savepoint_set(), when executing the SAVEPOINT A method.

However, as far as I could determine, there is no such matching call, which
would explain why *sv is uninitialised.

This would be easier to see in the log with --debug=d,*:t:i where all the
lines are prefixed with thread ID, but it seems clear enough, there is no
prior logged binlog_savepoint_set() using the same io_cache 0x3cee63b0.

So it appears the problem is that the test does a ROLLBACK TO SAVEPOINT A
without a prior SAVEPOINT A statement. This should have given an error
(right?), but instead it seems to end up calling into the binlog with
uninitialised data, which corrupts the internal binlog state and eventually
leads to the assertion.

Does this help you proceed? Maybe you can verify this scenario from the
general log. You may also be able to produce a stand-alone test case (by
taking stuff from the general log, or writing your own). One transaction does
a ROLLBACK TO SAVEPOINT without a matching prior SAVEPOINT. And then later a
ROLLBACK (I think in a different transaction, but you can try same transaction
also) triggers the actual assertion.

Or if not, send the full log with --debug=d,*:t:i, and I can look again. Or
maybe you can do so yourself, find the occurence of the invalid seek position
and single out the stuff for that thread id only.

Hope this helps,

 - Kristian.


Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx> writes:

> Jan Lindström <jplindst@xxxxxxxxxxx> writes:
>
>> Full unedited log using --valgrind and --mysqld=--debug=+d at location: https:/
>> /www.dropbox.com/s/dcufp7l1cch8dfs/mysql.err.gz
>
> Ok, thanks.
>
> Hm, this log looks different from what I am used to seeing when I use
> mysql-test-run.pl --debug. From looking at the mysql-test-run.pl code it seems
> I should have asked for
>
>     --debug=d,*:t:i
>
> which will show function nesting and thread IDs, missing from the current log.


Follow ups

References