← Back to team overview

maria-developers team mailing list archive

Re: Galera-10.0 still does not work with rollbacks

 

Jan Lindström <jplindst@xxxxxxxxxxx> writes:

> Correct log file attached.

> ==00:00:02:33.708 25289== Use of uninitialised value of size 8
> ==00:00:02:33.708 25289==    at 0x5EF865B: _itoa_word (_itoa.c:179)
> ==00:00:02:33.708 25289==    by 0x5EFCB91: vfprintf (vfprintf.c:1654)
> ==00:00:02:33.708 25289==    by 0x5F22654: vsnprintf (vsnprintf.c:119)
> ==00:00:02:33.708 25289==    by 0x5F03141: snprintf (snprintf.c:34)
> ==00:00:02:33.708 25289==    by 0x936453: binlog_commit(handlerton*, THD*, bool) (log.cc:2054)
> ==00:00:02:33.708 25289==    by 0x86B3DF: commit_one_phase_2(THD*, bool, THD_TRANS*, bool) (handler.cc:1516)
> ==00:00:02:33.708 25289==    by 0x86B13B: ha_commit_trans(THD*, bool) (handler.cc:1436)
> ==00:00:02:33.709 25289==    by 0x7A0DD9: trans_commit_stmt(THD*) (transaction.cc:363)
> ==00:00:02:33.709 25289==    by 0x673DEB: mysql_execute_command(THD*) (sql_parse.cc:5373)
> ==00:00:02:33.709 25289==    by 0x677D27: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6820)
> ==00:00:02:33.709 25289==    by 0x6774A8: wsrep_mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6651)
> ==00:00:02:33.709 25289==    by 0x6688A3: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1454)
> ==00:00:02:33.709 25289== 
> ==00:00:02:33.709 25289== Conditional jump or move depends on uninitialised value(s)
> ==00:00:02:33.709 25289==    at 0x5EF8665: _itoa_word (_itoa.c:179)
> ==00:00:02:33.709 25289==    by 0x5EFCB91: vfprintf (vfprintf.c:1654)
> ==00:00:02:33.709 25289==    by 0x5F22654: vsnprintf (vsnprintf.c:119)
> ==00:00:02:33.709 25289==    by 0x5F03141: snprintf (snprintf.c:34)
> ==00:00:02:33.709 25289==    by 0x936453: binlog_commit(handlerton*, THD*, bool) (log.cc:2054)
> ==00:00:02:33.709 25289==    by 0x86B3DF: commit_one_phase_2(THD*, bool, THD_TRANS*, bool) (handler.cc:1516)
> ==00:00:02:33.709 25289==    by 0x86B13B: ha_commit_trans(THD*, bool) (handler.cc:1436)
> ==00:00:02:33.709 25289==    by 0x7A0DD9: trans_commit_stmt(THD*) (transaction.cc:363)
> ==00:00:02:33.709 25289==    by 0x673DEB: mysql_execute_command(THD*) (sql_parse.cc:5373)
> ==00:00:02:33.709 25289==    by 0x677D27: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6820)
> ==00:00:02:33.709 25289==    by 0x6774A8: wsrep_mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6651)
> ==00:00:02:33.709 25289==    by 0x6688A3: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1454)
> ==00:00:02:33.709 25289== 
> 130920 13:53:04 [ERROR] WSREP: ::jan_BINLOG_COMMIT:: TRX: Pending (nil) pos 0, pos_in_file 892097568 current_pos 892132560 request_pos 892132664 write_pos 303074896

So this is the explanation for the strange numbers in your printouts, like
892097568. Those values are uninitialised memory (probably from uninitialised
stack variables?).

Since the assertion you get is exactly about the values of those IO_CACHE
values being wrong, this looks like something to investigate.

I am however curious about why there is only a single Valgrind error about
this? As there are many of those unexpected numbers in your output.

However, most of the Valgrind errors seem to be in Galera code. It is hard to
say for me if they are related. Seppo, can you look into that from your side?
Normally, we strive _very_ hard to make MariaDB code clean in Valgrind...

> It did produce the output, not yet found the reason, some of the output just
> before corruption
>
> MYSQL_BIN_LOG::flush_and_set_pending_rows_event(event): enter: event:
> 0x7f2a200bc800
> MYSQL_BIN_LOG::flush_and_set_pending_rows_event(event): info: cache_mngr->
> pending(): 0x0
> Rows_log_event::do_add_row_data: enter: row_data: 0x7f2a200cb248  length: 10
> Rows_log_event::do_add_row_data: row_data: Memory: 0x7f2a200cb248  Bytes: (10)
> F8 01 00 00 00 01 74 02 77 65
> my_realloc: my: ptr: 0x0  size: 1024  my_flags: 80
> my_malloc: my: size: 1024  my_flags: 80
> my_malloc: exit: ptr: 0x7f2a200895e0

I need to see the *complete* log to be able to help. Yes, I know it can be
hundreds of megabyte maybe, but it should compress well. Upload it somewhere
I can access it and I can take a look.

Using this log file, we can deduce exactly which paths the code was taking as
it got to that assert. I'm wondering if the ROLLBACK happens early and ends up
executing some binlog code at a point where transaction start has not properly
initialised the necessary data structures. So we need the dbug output from the
entire execution since transaction start.

(If you can get the error log/Valgrind output in the same log file as the
dbug, that would make things much easier. But if not, I can still try to
decipher it...)

 - Kristian.


Follow ups

References