← Back to team overview

maria-developers team mailing list archive

Re: Galera-10.0 still does not work with rollbacks

 

On 2013-09-20 15:51, Kristian Nielsen wrote:
Jan Lindström <jplindst@xxxxxxxxxxx> writes:

Correct log file attached.

==00:00:02:33.708 25289== Use of uninitialised value of size 8
==00:00:02:33.708 25289==    at 0x5EF865B: _itoa_word (_itoa.c:179)
==00:00:02:33.708 25289==    by 0x5EFCB91: vfprintf (vfprintf.c:1654)
==00:00:02:33.708 25289==    by 0x5F22654: vsnprintf (vsnprintf.c:119)
==00:00:02:33.708 25289==    by 0x5F03141: snprintf (snprintf.c:34)
==00:00:02:33.708 25289== by 0x936453: binlog_commit(handlerton*, THD*, bool) (log.cc:2054) ==00:00:02:33.708 25289== by 0x86B3DF: commit_one_phase_2(THD*, bool, THD_TRANS*, bool) (handler.cc:1516) ==00:00:02:33.708 25289== by 0x86B13B: ha_commit_trans(THD*, bool) (handler.cc:1436) ==00:00:02:33.709 25289== by 0x7A0DD9: trans_commit_stmt(THD*) (transaction.cc:363) ==00:00:02:33.709 25289== by 0x673DEB: mysql_execute_command(THD*) (sql_parse.cc:5373) ==00:00:02:33.709 25289== by 0x677D27: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6820) ==00:00:02:33.709 25289== by 0x6774A8: wsrep_mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6651) ==00:00:02:33.709 25289== by 0x6688A3: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1454)
==00:00:02:33.709 25289==
==00:00:02:33.709 25289== Conditional jump or move depends on uninitialised value(s)
==00:00:02:33.709 25289==    at 0x5EF8665: _itoa_word (_itoa.c:179)
==00:00:02:33.709 25289==    by 0x5EFCB91: vfprintf (vfprintf.c:1654)
==00:00:02:33.709 25289==    by 0x5F22654: vsnprintf (vsnprintf.c:119)
==00:00:02:33.709 25289==    by 0x5F03141: snprintf (snprintf.c:34)
==00:00:02:33.709 25289== by 0x936453: binlog_commit(handlerton*, THD*, bool) (log.cc:2054) ==00:00:02:33.709 25289== by 0x86B3DF: commit_one_phase_2(THD*, bool, THD_TRANS*, bool) (handler.cc:1516) ==00:00:02:33.709 25289== by 0x86B13B: ha_commit_trans(THD*, bool) (handler.cc:1436) ==00:00:02:33.709 25289== by 0x7A0DD9: trans_commit_stmt(THD*) (transaction.cc:363) ==00:00:02:33.709 25289== by 0x673DEB: mysql_execute_command(THD*) (sql_parse.cc:5373) ==00:00:02:33.709 25289== by 0x677D27: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6820) ==00:00:02:33.709 25289== by 0x6774A8: wsrep_mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:6651) ==00:00:02:33.709 25289== by 0x6688A3: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1454)
==00:00:02:33.709 25289==
130920 13:53:04 [ERROR] WSREP: ::jan_BINLOG_COMMIT:: TRX: Pending (nil) pos 0, pos_in_file 892097568 current_pos 892132560 request_pos 892132664 write_pos 303074896

So this is the explanation for the strange numbers in your printouts, like 892097568. Those values are uninitialised memory (probably from uninitialised
stack variables?).

Since the assertion you get is exactly about the values of those IO_CACHE
values being wrong, this looks like something to investigate.

I am however curious about why there is only a single Valgrind error about
this? As there are many of those unexpected numbers in your output.

However, most of the Valgrind errors seem to be in Galera code. It is hard to say for me if they are related. Seppo, can you look into that from your side? Normally, we strive _very_ hard to make MariaDB code clean in Valgrind...

Kristian, so do we.

Please make sure you're using debug Galera build and have GLIBCXX_FORCE_NEW environment variable exported (http://valgrind.org/docs/manual/faq.html#faq.reports)

Regards,
Alex

It did produce the output, not yet found the reason, some of the output just
before corruption

MYSQL_BIN_LOG::flush_and_set_pending_rows_event(event): enter: event:
0x7f2a200bc800
MYSQL_BIN_LOG::flush_and_set_pending_rows_event(event): info: cache_mngr->
pending(): 0x0
Rows_log_event::do_add_row_data: enter: row_data: 0x7f2a200cb248  length: 10 Rows_log_event::do_add_row_data: row_data: Memory: 0x7f2a200cb248  Bytes: (10)
F8 01 00 00 00 01 74 02 77 65
my_realloc: my: ptr: 0x0  size: 1024  my_flags: 80
my_malloc: my: size: 1024  my_flags: 80
my_malloc: exit: ptr: 0x7f2a200895e0

I need to see the *complete* log to be able to help. Yes, I know it can be hundreds of megabyte maybe, but it should compress well. Upload it somewhere
I can access it and I can take a look.

Using this log file, we can deduce exactly which paths the code was taking as it got to that assert. I'm wondering if the ROLLBACK happens early and ends up executing some binlog code at a point where transaction start has not properly initialised the necessary data structures. So we need the dbug output from the
entire execution since transaction start.

(If you can get the error log/Valgrind output in the same log file as the dbug, that would make things much easier. But if not, I can still try to
decipher it...)

 - Kristian.

_______________________________________________
Mailing list: https://launchpad.net/~maria-developers
Post to     : maria-developers@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp

--
Alexey Yurchenko,
Codership Oy, www.codership.com
Skype: alexey.yurchenko, Phone: +358-400-516-011



References