← Back to team overview

maria-developers team mailing list archive

Re: slave_ddl_exec_mode and incompatible change in MariaDB 10.0.8

 

Hi!

First, very sorry that I was not able to answer this question until
now.  I have been traveling in Asia for 2 weeks with very limited
time for internet + email.

Now I am however back and hope to answer all possible questions you
have on this topic.

>>>>> "Pavel" == Pavel Ivanov <pivanof@xxxxxxxxxx> writes:

Pavel> Hi,
Pavel> I've discovered a change in MariaDB 10.0.8 that I don't quite
Pavel> understand motivation for and which looks really surprising to me. So
Pavel> I would appreciate if you could tell me what's the motivation and why
Pavel> you think it's appropriate to do that.

Pavel> The change in question is
Pavel> http://bazaar.launchpad.net/~maria-captains/maria/10.0/revision/3961.1.7.
Pavel> I see several issues with it:

Pavel> 1. It introduces a pretty significant variable slave_ddl_exec_mode
Pavel> which wasn't announced in the Release Notes.

It looks like Daniel accidently forget to mention this in the release notes.
I have now added thus to:
https://mariadb.com/kb/en/mariadb-1008-release-notes/

I did however originally document this change in:
https://mariadb.com/kb/en/create-table/
https://mariadb.com/kb/en/drop-table/
https://mariadb.com/kb/en/mysqld-options-full-list/
https://mariadb.com/kb/en/what-is-mariadb-100/
https://mariadb.com/kb/en/replication-and-binary-log-server-system-variables/

and I wrote a very detail blog post why this change was needed:
http://monty-says.blogspot.fi/2014/02/the-final-piece-of-puzzle.html

Pavel> 2. The default variable value was set to IDEMPOTENT which changes
Pavel> MariaDB slave behavior in an incompatible way. Not everyone will want
Pavel> to have this variable set to IDEMPOTENT (I'd actually argue that no
Pavel> one should set it to IDEMPOTENT), but without any mentioning of this
Pavel> variable in the Release Notes people basically have the only chance to
Pavel> get knowledge about this change (let me stress out: _incompatible_
Pavel> change) from production incidents or from accidental code inspection.

Incompatible in which way?
Can you please show me some common scenarions when the current default would
not be what you want or expect?

My argument is that most people would expect the following from the
MariaDB replication:
(Most of the following was not true before my patch and is not true
for MySQL)

- That row based, statement based and mixed mode replication would
  produce the same number of GTID.
- One should be able to have different replication mode on master and
  slave and still get same number of GTID's.
- Row based, statement based and mixed mode replication would
  produce the same data (was not true before my patch).
- If a slave would die, it should be able restart without user
  intervention as long as the data on the slave and master will be
  exactly the same.
- Statements should always be executed in the same order on the slave
  and master even if the slave and master used different storage
  engines (transactional or not).

Yes, things are not 100 % compatible with how MySQL does things, but
in almost all cases things are better than before.  The slave will be
able to handle restarts in a lot of common scenarios it could not
handle before.

Bcause of this I would argue that almost every single user would
prefer the new mode compared to how things where before.


Pavel> 3. When slave_ddl_exec_mode is set to IDEMPOTENT every "DROP TABLE"
Pavel> event in master's binlog is converted into "DROP TABLE IF EXISTS" in
Pavel> slave's binlog, which I believe is a major no-no for replication.

Why ?
In which scenario is this not desirable?

Pavel> Binlogs on master and slave should be identical, otherwise after
Pavel> failover even if one would set slave_ddl_exec_mode = STRICT it will
Pavel> still execute "DROP TABLE IF EXISTS" because new master has that in
Pavel> binlogs.

The binlogs on master and slave have never been identical, especially
if you are using different replication modes.
Here is some strange usage:

- For DROP of temporary tables IF EXISTS was added in some cases, in
  other cases the DROP was never written to the binary log.
- CREATE IF NOT EXISTS .... LIKE other_table could create different
  tables based on replication mode.
- CREATE ... SELECT was replicated with 1 GTID on statement based mode
  and 2 GTID's in row based mode.
  - MySQL 5.6 doesn't even allow one to use CREATE ... SELECT with GTID.

The CREATE ... REPLACE patch fixes many cases that before causes data
and tables to be different on master and slave.

Here is some scenarios of things can go wrong and how the new mode
will handle it better than the old one:

1) Repeatable DROP

master:
DROP TABLE t1,t2;

old slave:
DROP TABLE t1
crash; restart
DROP TABLE t1
- Slave will stop as t1 doesn't exist anymore.

New slave:
DROP TABLE IF EXISTS t1  ; Drops t1
crash; restart
DROP TABLE IF EXISTS t1
DROP TABLE IF EXISTS t2

End result, master and slave are consistent.

2) Repeatable CREATE

master:
CREATE TABLE t1 (a int);

old slave:
CREATE TABLE t1 (a int);
- Crash before binary log is written ; restart
CREATE TABLE t1 (a int);
- Fail

new slave:
CREATE TABLE t1 (a int);
- Crash before binary log is written ; restart
CREATE OR REPLACE TABLE t1 (a int);

This will work (or can easilbe be made to work) in most cases even if
the slave fails between the .frm file being rewritten and the table is
created in the storage engine.

3) Repatable CREATE ... SELECT

master:
CREATE TABLE t1 SELECT * from t2;

old slave:
CREATE TABLE t1 (a int);
INSERT INTO t1 SELECT * from t2;
- Crash during select ; restart
Roll back insert (as part of recovery)
CREATE TABLE t1 (a int);
- Fail

old slave:
CREATE TABLE t1 (a int);
INSERT INTO t1 SELECT * from t2;
- Crash during select ; restart
Roll back insert (as part of recovery)
DROP TABLE t1;
CREATE TABLE t1 (a int);
INSERT INTO t1 SELECT * from t2;

I don't know of any case where the new default mode would create and
inconsistency between master and slave.

Why do you think it's matter that DROP TABLE is replicated as DROP
TABLE IF EXISTS ?

When the slave sees the DROP TABLE, it knows that the table doesn't
exists anymore on the master (it doesn't really matter if the table
existed or not on the master, does it)?

There are two cases how this can be run on the slave:

- If the table existed, it will be deleted.
- If the table doesn't exist, nothing will happen.

In both scenarios the slave will be identical to the master (when it
comes to this table).

Pavel> 4. Why this change introduces OPTION_GTID_BEGIN in the code? Why
Pavel> OPTION_BEGIN wasn't enough?

Because the MySQL replication code doesn't honor OPTION_BEGIN.

- It may do an implicit commit in many different scenarios.
  (On example is CREATE TABLE, but there are others)
- It may reorder statements within a transaction (if some tables are
  transactional and others are not)
- It allows us to have DDL's in a GTID transaction.

When OPTION_GITD_BEGIN is set, all tables are treated as equal, in
spite of table type, and no implicit commits are allowed.

This help ensures that the slave and master's data will be equal after
the statements where executed. (The original code didn't allow that).

For example:

master (where all tables are InnoDB):
BEGIN
INSERT into t1 values (1);
INSERT into t1 select * from t2;
INSERT into t2 values (2);
COMMIT;

slave (where t2 is MyISAM). The binlog will look like this:

BEGIN
INSERT into t2 values (2);
INSERT into t1 select * from t2;
INSERT into t1 values (1,2,3);
COMMIT;

This is because, by default, MySQL puts all statements that changes
non transactional tables first in the binary log.


Pavel> From reading the code the only thing I
Pavel> understand is that with broken binlogs and sequence of events "GTID
Pavel> BEGIN; CREATE TABLE; CREATE TABLE" MariaDB will try to execute both
Pavel> CREATE TABLE statements in the same transaction without
Pavel> auto-committing at the end.

What do you mean with 'broken binlogs' ?
How would you be able to create a broken binlog?

If the master restarts, it will write a new 'log start' statement that
will reset all 'BEGIN' flags, so there is no possible scenario, I know
of, that would cause a binary log with two create statements after
each other.

If it's possible to get such a binary log, then you can also get
similar problems with DML's.

My patch had nothing to do with trying to work with broken binary
logs.

Pavel> This sounds pretty crazy to me.

Trying to execute broken binary log's sounds very crazy to me too.
By probably I don't understand what you really think is a problem.

Lets look at this from a different angel:

One part of the patch solves how CREATE ... SELECT is logged.

Before it was logged in statement based replication:
GTID BEGIN
CREATE ... SELECT
COMMIT

and in row based replication as:
GTID BEGIN
CREATE
COMMIT
INSERT INTO ... SELECT
COMMIT

After my patch the second version is logged as:
GTID BEGIN
CREATE
INSERT INTO ... SELECT
COMMIT

so we have the same number of GTID's in both replication modes.

Is this more clear?

Pavel> And given
Pavel> the fact that in normal situation such sequence of events on the
Pavel> master is not possible it seems that existence of both OPTION_BEGIN
Pavel> and OPTION_GTID_BEGIN only introduces confusion (MariaDB have GTID
Pavel> always on, right? So it should always execute GTID_BEGIN and never
Pavel> BEGIN without GTID, right?).

As I tried to explain above, OPTION_BEGIN doesn't make sense for a
slave, that is why we need OPTION_GTID_BEGIN.

And yes, a slave should never execute BEGIN without a GTID.

Pavel> So what is your word on this? Can you at least set default value for
Pavel> slave_ddl_exec_mode to STRICT so that slave behavior didn't change
Pavel> with the version upgrade?

I don't think that having a more stable slave is a behavior change.

Sorry, but I really don't understand what concerns you have with the
new behavior. Is it just that it was not clearly explained, or is
there some scenario you are working with where it would be better to
get the slave to fail than continue (even when we know that the data
on master and slave will be identical).

Please give me some examples of the scenarios that you are concerned
about!

Regards,
Monty


References