maria-developers team mailing list archive

Thread
Date

Re: Interaction between rpl_slave_state and rpl_binlog_state

To: Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
From: andrei.elkin@xxxxxxxxxx
Date: Tue, 28 Nov 2017 17:40:37 +0200
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx, Andrei Elkin <andrei.elkin@xxxxxxxxxxx>
In-reply-to: <87k1yar4h3.fsf@urd.knielsen-hq.org> (Kristian Nielsen's message of "Tue, 28 Nov 2017 13:07:52 +0100")
Organization: Home sweet home
Razorgate-kas: Status: not_detected
Razorgate-kas: Rate: 0
Razorgate-kas: Envelope from:
Razorgate-kas: Version: 5.5.3
Razorgate-kas: LuaCore: 80 2014-11-10_18-01-23 260f8afb9361da3c7edfd3a8e3a4ca908191ad29
Razorgate-kas: Lua profiles 69136 [Nov 12 2014]
Razorgate-kas: Method: none
Reply-to: andrei.elkin@xxxxxxxxxxx
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.0.50 (gnu/linux)

Kristian, howdy.

Thanks for a simple CHANGE MASTER ... IGNORE_SERVER_IDS
that you remind us about! (This time evaded myself alone :-))
It perfectly covers a cluster circular case.

What motivated me to consider this option for looking for duplicates
also in gtid_binlog_pos was the following observation.

A duplicate gtid (transaction) can also arrive from a separate session
of the same server but in this case the gtid_ignore_duplicates rules
do not apply. Such gtid would silently override an existing.

On the other hand gtid_strict_mode applies to either the ordinary server
and the slave (by the docs).

   MariaDB [test]> show global variables like 'gtid_binlog_pos';
   +-----------------+--------+
   | Variable_name   | Value  |
   +-----------------+--------+
   | gtid_binlog_pos | 0-1-12 |
   +-----------------+--------+
   1 row in set (0.00 sec)

   MariaDB [test]> set @@session.gtid_seq_no=11;
   ERROR 1950 (HY000): An attempt was made to binlog GTID 0-1-11 which
   would create an out-of-order sequence number with existing GTID 0-1-12,

Maybe it would not a bad idea to generalize the gtid_ignore_duplicates to
cover any source duplicate which would become effectively a "soft" mode to silently
... reject.

In other words how about extending a gtid (operational) mode as a set to

"gtid_mode" \in {
                  on     (override by dups),
                  strict (error out dups)
+               , soft   (ignore dups)
                }

To other subjects,

> Sachin Setiya <sachin.setiya@xxxxxxxxxxx> writes:
>
>> I have some question related to rpl_slave_state. Suppose A circular
>> async replication between A < -- > B (gtid_ignore_duplicates on)
>
> Why do you set gtid_ignore_duplicates? This option is for multi-source
> replication:
>
>   https://mariadb.com/kb/en/library/gtid/#gtid_ignore_duplicates
>
>   "When set, different master connections in multi-source replication are
>   allowed to receive and process event groups with the same GTID"
>
> But you are not using multi-source connection here, there is only one master
> connection (eg. connection to B on slave A).
> Thus, the option will do nothing in this case.
>
>> Now, we set some temp server_id on server A , lets say `X`. Now the
>> problem is each event group which
>> originates from A is executed 2 times. For example we insert into
>> table t1 and gtid is 0-X-2. The event goes to slave B
>> B applies it, And send it back to A, Since its server_is different
>
> I think here you mean that A has server_id=1 (eg), B has server_id=2, but on
> A you do
>
>   SET server_id=3;
>   INSERT INTO t1 VALUES (1);
>
> But there is no server with server_id=3 anywhere. In this case, you need to
> break the circle yourself somewhere. For example by CHANGE MASTER ...
> IGNORE_SERVER_IDS=3 on A.
>
> To my knowledge, this has always been so for ring replication.
>
>> Andrei suggested a solution of checking rpl_binlog_state in
>> check_duplicate_gtid, This solution solves some problem but creates
>
> It seems you think that --gtid-ignore-duplicates should magically ignore any
> apply of duplicate GTID. But that is not the case, as the documentation
> states (though admittedly rather briefly). --gtid-ignore-duplicates is
> _only_ for multi-source replication (so perhaps unfortunately named).
>
> In this case, the conflict is not between GTIDs replicated from different
> master connections. It is a conflict between a transaction originated on a
> master with a transaction replicated from another master.
>
>> write gtid_event in log. But this does not make sense. rpl_slave_state
>> should be used for slave replication usage.
>
> Agree. rpl_binlog_state should not be involved in slave GTID processing.
> There should be a clear separation: rpl_slave_state is what a slave has
> applied from another master. rpl_binlog state is what a master has
> originated.
>
> The gtid_ignore_duplicates option is already very difficult for users to
> understand and use correctly. It would be a mistake to make it even more
> complicated.
>
> Also, this seems to originate from some Galera issue. It is well known that
> Galera was merged prematurely into MariaDB with a broken design, and this
> was never fixed. Galera issues must never influence how non-galera
> replication (which at least attempts to have a proper design) works.

I would support this.

Cheers,

Andrei

Follow ups

Re: Interaction between rpl_slave_state and rpl_binlog_state
From: Kristian Nielsen, 2017-11-28

References

Interaction between rpl_slave_state and rpl_binlog_state
From: Sachin Setiya, 2017-11-28
Re: Interaction between rpl_slave_state and rpl_binlog_state
From: Kristian Nielsen, 2017-11-28