← Back to team overview

maria-developers team mailing list archive

Re: slave_ddl_exec_mode and incompatible change in MariaDB 10.0.8



>>>>> "Pavel" == Pavel Ivanov <pivanof@xxxxxxxxxx> writes:

Pavel> Here is a reproduction test case. I took the vanilla tarball of
Pavel> 10.0.8, applied to it the following patch:

Pavel> @@ -131,6 +131,11 @@ bool trans_begin(THD *thd, uint flags)

Pavel>    DBUG_ASSERT(!thd->locked_tables_mode);

Pavel> +  if (thd->slave_thread && (thd->variables.option_bits & OPTION_BEGIN))
Pavel> +    abort();
Pavel> +#endif
Pavel> +
Pavel>    if (thd->in_multi_stmt_transaction_mode() ||
Pavel>        (thd->variables.option_bits & OPTION_TABLE_LOCK))
Pavel>    {

Pavel> Then I compiled and ran the following test:

Pavel> --source include/master-slave.inc
Pavel> connection master;
Pavel> create table t (n int);
Pavel> insert into t values (1);
Pavel> show binlog events;
Pavel> sync_slave_with_master;

Pavel> That test had this output:

Pavel> include/master-slave.inc
Pavel> [connection master]
Pavel> create table t (n int);
Pavel> insert into t values (1);
Pavel> show binlog events;
Pavel> Log_name Pos Event_type Server_id End_log_pos Info
Pavel> master-bin.000001 4 Format_desc 1 248 Server ver:
Pavel> 10.0.8-MariaDB-debug-log, Binlog ver: 4
Pavel> master-bin.000001 248 Gtid_list 1 273 []
Pavel> master-bin.000001 273 Binlog_checkpoint 1 313 master-bin.000001
Pavel> master-bin.000001 313 Gtid 1 351 GTID 0-1-1
Pavel> master-bin.000001 351 Query 1 436 use `test`; create table t (n int)
Pavel> master-bin.000001 436 Gtid 1 474 BEGIN GTID 0-1-2
Pavel> master-bin.000001 474 Query 1 561 use `test`; insert into t values (1)
Pavel> master-bin.000001 561 Query 1 630 COMMIT

Pavel> And then it said that slave died with the stack trace

Pavel> sql/transaction.cc:139(trans_begin(THD*, unsigned int))[0x788e20]
Pavel> sql/log_event.cc:6478(Gtid_log_event::do_apply_event(rpl_group_info*))[0x93a685]
Pavel> sql/log_event.h:1341(Log_event::apply_event(rpl_group_info*))[0x5ca108]
Pavel> sql/slave.cc:3191(apply_event_and_update_pos(Log_event*, THD*,
Pavel> rpl_group_info*, rpl_parallel_thread*))[0x5c0da8]
Pavel> sql/slave.cc:3464(exec_relay_log_event)[0x5c1498]
Pavel> sql/slave.cc:4516(handle_slave_sql)[0x5c44e9]

Pavel> Which means that slave tries to execute BEGIN event while OPTION_BEGIN
Pavel> is set which shouldn't ever happen.

The assert you put in the code doesn't show that anything is wrong.

The reason is the following code in log_event.cc:

  thd->variables.option_bits|= OPTION_BEGIN | OPTION_GTID_BEGIN;
  trans_begin(thd, 0);

In other words, we do set OPTION_BEGIN just before calling
trans_begin(), so the assert is wrong.

Pavel> And to answer all of your other questions, our main concern is simple:
Pavel> master and slave should always have absolutely the same database
Pavel> contents, absolutely the same tables and absolutely the same data in
Pavel> those tables.


Pavel> Any difference in those can be created only by humans
Pavel> and must be resolved only by humans. Absolutely no magic please, it's
Pavel> unacceptable, whenever inconsistency is detected replication must stop
Pavel> and wait for human intervention. It's not enough to have the same data
Pavel> eventually. And if any DBA requests a different behavior he doesn't
Pavel> understand what kind of troubles waits him in the future.

The problem you are not taking into account for is that while
executing CREATE TABLE or DROP TABLE, the slave is inconsistent until
we write things to the binary log and there was no before no automatic
way to resolve this without user intervention.

The new code I added will not make the slave less reliable in any
relevant scenario I can think off.

If there is an inconsistency between slave and master, it will be
detected when data is applied.

Pavel> As a consequence to that slave shouldn't execute any implicit commits,
Pavel> because it's impossible to generate binlogs on master that will
Pavel> require implicit commits.

The original code that only tested OPTION_BEGIN did generate implicit
commits in a lot of cases. That is why I introduced OPTION_GTID_BEGIN,
just to avoid implicit commits.

Pavel> Another consequence is CREATE TABLE
Pavel> statement should never automatically delete the table if it already
Pavel> exists. Who knows how the existing table was created and how important
Pavel> the data that is stored in it?

This scenario doesn't really matter for a slave.
One should never create a table on the slave that does not exist on
the master.  The scenarios when this makes sense is very very rare and
in this case on can set slave_ddl_exec_mode to IGNORE to handle this.

What we do know when we see a CREATE TABLE in the binary log is that
the table did not exist on the master and we should create such an
empty table slave.

The most likely scenario for this to happen is that the slave died
while a CREATE TABLE was executed and before the binary log was
written. The new code is done exactly to cope with this scenario.

Pavel> Definitely not MariaDB. These questions
Pavel> should be answered by human and human should decide whether it's ok to
Pavel> delete existing table. Again for the same reason DROP TABLE should
Pavel> never be silently ignored if the table doesn't exist -- who knows what
Pavel> happened and why it doesn't exist when it did exist on master? That
Pavel> should be investigated by human.

The most likely reason for that table not existing is again that the
slave died in the middle of a drop.

The problem with the old way of handling CREATE & DROP is that if you
do a lot of CREATE and DROP tables (which a lot of applications are
actually doing) there is a very big change that a slave will not be
able to recover from a crash.

The new code is safe for all practical purposes.  Any inconsistencies in
data, which is not likely to occur of the new behavior with CREATE or
DROP TABLE will be detected during execution of DML's.

Pavel> Of course world is not perfect. If slave can crash in the middle of
Pavel> CREATE TABLE and not rollback the table creation on restart, that's a
Pavel> problem. But MariaDB should not assume that if table is exists then
Pavel> it's there because of a crash, there could be other reasons. If slave
Pavel> can crash while executing DROP TABLE and not rollback that on restart,
Pavel> that's a problem too, but again it must be resolved by human (or by
Pavel> code that does a proper rollback).

There is no possiblity to rollback CREATE or DROP in the current code.
The only possiblity we have now is to allow these DDL's to succeed.

I personally think this is the safest way to go forwards, as we in
this case do know that for this table we will be consistent between
master and slave after the execution of the statement.

This is not true for the normal slave_exec_mode, which is why this is
not on by default.

Pavel> And as you rightfully noted temp tables can behave weirdly with
Pavel> replication, that's why we have code to prohibit creation of temp
Pavel> tables on masters.

This may be ok for Google, but it's not ok for most other MariaDB users.

Pavel> CREATE IF NOT EXISTS can result in different data
Pavel> on master and slave, that's why we prohibit execution of such
Pavel> statements (as well as DROP IF EXISTS). And for any other feature that
Pavel> may misbehave in replication we will put some blocks in place to avoid
Pavel> any breakage.

Agree that CREATE IF NOT EXIST can in theory give different data
between master and slave.  This is however not the case with DROP IF

Pavel> So that's our main concern and our main expectation of how MariaDB
Pavel> should behave. And we would really appreciate if that behavior didn't
Pavel> silently change to break the "no magic by default" expectation.

The change we did was not silently done. It was documented and blogged
in several places.

I added the slave_ddl_exec_mode variable to ensure that companies like
yours should be able to change the behavior to the old one if they
would have strange master/slave setups where it's ok that the data is
different between master and slave.

For normal people that needs something that will 'just work' and do
not know what to do if the slave suddenly would fail to restart, the
current defaults are better as behavior for CREATE or DROP will never
cause inconsistencies between master and slave and will make the slave
more reliable on restarts.


Follow ups