maria-developers team mailing list archive

Thread
Date
Re: slave_ddl_exec_mode and incompatible change in MariaDB 10.0.8

To: Michael Widenius <monty@xxxxxxxxxxxx>
From: Pavel Ivanov <pivanof@xxxxxxxxxx>
Date: Tue, 25 Feb 2014 19:31:04 +0000
Cc: Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>, maria-developers <maria-developers@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <21260.52611.529956.483149@narttu.askmonty.org>
Here is a reproduction test case. I took the vanilla tarball of
10.0.8, applied to it the following patch:

@@ -131,6 +131,11 @@ bool trans_begin(THD *thd, uint flags)

   DBUG_ASSERT(!thd->locked_tables_mode);

+#ifdef HAVE_REPLICATION
+  if (thd->slave_thread && (thd->variables.option_bits & OPTION_BEGIN))
+    abort();
+#endif
+
   if (thd->in_multi_stmt_transaction_mode() ||
       (thd->variables.option_bits & OPTION_TABLE_LOCK))
   {


Then I compiled and ran the following test:

--source include/master-slave.inc
connection master;
create table t (n int);
insert into t values (1);
show binlog events;
sync_slave_with_master;


That test had this output:

include/master-slave.inc
[connection master]
create table t (n int);
insert into t values (1);
show binlog events;
Log_name Pos Event_type Server_id End_log_pos Info
master-bin.000001 4 Format_desc 1 248 Server ver:
10.0.8-MariaDB-debug-log, Binlog ver: 4
master-bin.000001 248 Gtid_list 1 273 []
master-bin.000001 273 Binlog_checkpoint 1 313 master-bin.000001
master-bin.000001 313 Gtid 1 351 GTID 0-1-1
master-bin.000001 351 Query 1 436 use `test`; create table t (n int)
master-bin.000001 436 Gtid 1 474 BEGIN GTID 0-1-2
master-bin.000001 474 Query 1 561 use `test`; insert into t values (1)
master-bin.000001 561 Query 1 630 COMMIT


And then it said that slave died with the stack trace

sql/transaction.cc:139(trans_begin(THD*, unsigned int))[0x788e20]
sql/log_event.cc:6478(Gtid_log_event::do_apply_event(rpl_group_info*))[0x93a685]
sql/log_event.h:1341(Log_event::apply_event(rpl_group_info*))[0x5ca108]
sql/slave.cc:3191(apply_event_and_update_pos(Log_event*, THD*,
rpl_group_info*, rpl_parallel_thread*))[0x5c0da8]
sql/slave.cc:3464(exec_relay_log_event)[0x5c1498]
sql/slave.cc:4516(handle_slave_sql)[0x5c44e9]


Which means that slave tries to execute BEGIN event while OPTION_BEGIN
is set which shouldn't ever happen.


And to answer all of your other questions, our main concern is simple:
master and slave should always have absolutely the same database
contents, absolutely the same tables and absolutely the same data in
those tables. Any difference in those can be created only by humans
and must be resolved only by humans. Absolutely no magic please, it's
unacceptable, whenever inconsistency is detected replication must stop
and wait for human intervention. It's not enough to have the same data
eventually. And if any DBA requests a different behavior he doesn't
understand what kind of troubles waits him in the future.

As a consequence to that slave shouldn't execute any implicit commits,
because it's impossible to generate binlogs on master that will
require implicit commits. Another consequence is CREATE TABLE
statement should never automatically delete the table if it already
exists. Who knows how the existing table was created and how important
the data that is stored in it? Definitely not MariaDB. These questions
should be answered by human and human should decide whether it's ok to
delete existing table. Again for the same reason DROP TABLE should
never be silently ignored if the table doesn't exist -- who knows what
happened and why it doesn't exist when it did exist on master? That
should be investigated by human.

Of course world is not perfect. If slave can crash in the middle of
CREATE TABLE and not rollback the table creation on restart, that's a
problem. But MariaDB should not assume that if table is exists then
it's there because of a crash, there could be other reasons. If slave
can crash while executing DROP TABLE and not rollback that on restart,
that's a problem too, but again it must be resolved by human (or by
code that does a proper rollback).
And as you rightfully noted temp tables can behave weirdly with
replication, that's why we have code to prohibit creation of temp
tables on masters. CREATE IF NOT EXISTS can result in different data
on master and slave, that's why we prohibit execution of such
statements (as well as DROP IF EXISTS). And for any other feature that
may misbehave in replication we will put some blocks in place to avoid
any breakage.

So that's our main concern and our main expectation of how MariaDB
should behave. And we would really appreciate if that behavior didn't
silently change to break the "no magic by default" expectation.


Pavel


On Tue, Feb 25, 2014 at 5:06 PM, Michael Widenius <monty@xxxxxxxxxxxx> wrote:
>
> Hi!
>
>>>>>> "Pavel" == Pavel Ivanov <pivanof@xxxxxxxxxx> writes:
>
> Pavel> And now I found that this change is actually buggy. It turns out that
> Pavel> when slave executes a standalone CREATE TABLE event now it will set
> Pavel> OPTION_BEGIN flag in thd->variables.option_bits and won't reset it. I
> Pavel> don't know whether slave keeps transaction actually not committed
> Pavel> and/or whether it doesn't clean up some other transaction data, but
> Pavel> execution of the next event will always think there is a transaction
> Pavel> open and it needs to be auto-committed.
>
> I checked my patch, but I could not find any cases where I had added
> setting OPTION_BEGIN, except in connection with OPTION_GTID_BEGIN.
> OPTION_GTID_BEGIN is only set when we *know* that there will be a
> COMMIT event following in the log.
>
> I also try to verfiy this by running a test that does this on the master:
>
> "create table t2 (a int) engine=myisam"
>
> I added a breakpoint for the slave in
> "mysql_create_table"
>
> Neiter when the function was entered or exited was the OPTION_BEGIN
> flag set.
>
> Can you give me an example of where things goes wrong, preferably with
> an extract from the binary log that shows what is actually logged.
>
> For example, here is how a normal create table is logged.
> (From suite/rpl/r/create_or_replace_row.result)
>
> slave-bin.000001        #       Gtid    #       #       GTID #-#-#
> slave-bin.000001        #       Query   #       #       use `test`; create table t2 (a int) engine=myisam
> slave-bin.000001        #       Gtid    #       #       BEGIN GTID #-#-#
>
> The GTID above should not set OPTON_BEGIN or OPTION_GTID_BEGIN on the
> slave.
>
> However a CREATE ... SELECT will look like:
>
> master-bin.000001       #       Gtid    #       #       BEGIN GTID #-#-#
> master-bin.000001       #       Query   #       #       use `test`; CREATE TABLE
>  `t1` (
>   `f1` int(1) NOT NULL DEFAULT '0'
> )
> master-bin.000001       #       Table_map       #       #       table_id: # (tes
> t.t1)
> master-bin.000001       #       Write_rows_v1   #       #       table_id: # flag
> s: STMT_END_F
> master-bin.000001       #       Query   #       #       COMMIT
>
> The above will set the OPTION_BEGIN and OPTION_GTID_BEGIN for the
> CREATE STATEMENT and this will be reset by the COMMIT (that is
> guaranteed to follow).
>
> Pavel> But that also means that this
> Pavel> state cannot be distinguished from the case when slave received BEGIN
> Pavel> event, but didn't receive COMMIT event, i.e. either binlog on master
> Pavel> is corrupted or slave somehow skipped some events.
>
> - Corrupted binary logs should not be a concern.  In this case the
>   binary log can contain anything, including wrong DROP DATABASE
>   commands that could do anything.
> - If the master fails, the slave will notice this because it finds a
>   'binlog start event', which will reset the BEGIN bits.
> - In other words, there will always be a COMMIT event (either explicit
>   or implicite, like with a binlog start event)
> - The slave can only skip events with slave_skip_counter, but in this
>   case it will not be in BEGIN mode. During slave_skip_counter COMMIT
>   events will be noticed and the bit will be reset.
>
> How can the binlog be corrupted?
> How do you expect the master to handle corruption?
> Why is CREATE TABLE a special case you are concerned about, compared
> to other things like DELETE FROM TABLE in row based replication?
> (DELETE FROM expect a BEGIN, table_id, many delete-row-events, COMMIT).
>
> Pavel> Would MariaDB consider this as a serious problem?
>
> Please show me a test case first so that I can understand the problem.
>
> Regards,
> Monty
Follow ups

Re: slave_ddl_exec_mode and incompatible change in MariaDB 10.0.8
From: Michael Widenius, 2014-02-26
References

slave_ddl_exec_mode and incompatible change in MariaDB 10.0.8
From: Pavel Ivanov, 2014-02-21
Re: slave_ddl_exec_mode and incompatible change in MariaDB 10.0.8
From: Pavel Ivanov, 2014-02-24
Re: slave_ddl_exec_mode and incompatible change in MariaDB 10.0.8
From: Michael Widenius, 2014-02-25