← Back to team overview

maria-developers team mailing list archive

Re: Rollback causes assertion with Galera 10.0

 

Jan Lindström <jplindst@xxxxxxxxxxx> writes:

> #3  0x00007f1ff9877eb2 in __GI___assert_fail (assertion=0x109b4ff "empty()",
> file=0x109b4d8 "/home/jan/mysql/galera-10.0/sql/log.cc", line=294, function=
> 0x10a01e0 <binlog_cache_data::reset()::__PRETTY_FUNCTION__> "void
> binlog_cache_data::reset()") at assert.c:101
> #4  0x00000000009497d4 in binlog_cache_data::reset (this=0x7f1f7403ad60) at /
> home/jan/mysql/galera-10.0/sql/log.cc:294
> #5  0x0000000000949b6a in binlog_cache_mngr::reset (this=0x7f1f7403ab90,
> do_stmt=false, do_trx=true) at /home/jan/mysql/galera-10.0/sql/log.cc:462
> #6  0x0000000000935a4c in binlog_truncate_trx_cache (thd=0x45b7d40, cache_mngr=
> 0x7f1f7403ab90, all=true) at /home/jan/mysql/galera-10.0/sql/log.cc:1991
> #7  0x0000000000936123 in binlog_rollback (hton=0x3891fb0, thd=0x45b7d40, all=
> true) at /home/jan/mysql/galera-10.0/sql/log.cc:2180

> At empty() this is what does not hold : my_b_tell(&cache_log) == 0; // not sure
> what this means

The cache_log is used first in WRITE_CACHE mode to store events for DML done
in a transaction. And then in READ_CACHE mode to read out the events again and
copy them into the actual binlog file.

I suspect this is some problem with error handling. The binlog code is known
to be extremely complicated wrt. error handling, and there are many
possibilities for problems when errors turn up in unexpected places.

It sounds like you are able to inspect the state at the time of the assert
with gdb (running mysqld in the debugger or inspecting a core file). Can you
help me by obtaining some additional values that will help me understand how
this could have occured?

Try in GDB:

  p cache_log

I would like to see what the state of the cache_log is. Of particular interest
is the value of cache_log.type (WRITE_CACHE or READ_CACHE?) and the fields
pos_in_file, current_pos, and request_pos used in my_b_tell, but better to get
all the values.

With that info, I will try to analyse further...

I suspect this is related to a failure that occurs in an unusual place in
galera (conflict causing error during commit or something like that) that
either hits a bug in binlog error handling or was done incorrectly in the
galera patch. But I need more information about what lead up to this to be
sure ...

 - Kristian.


Follow ups

References