maria-developers team mailing list archive

Thread
Date

Re: Interaction between rpl_slave_state and rpl_binlog_state

To: andrei.elkin@xxxxxxxxxx
From: Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
Date: Tue, 28 Nov 2017 17:00:31 +0100
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx, andrei.elkin@xxxxxxxxxxx
In-reply-to: <877euav2bu.fsf@quad> (andrei elkin's message of "Tue, 28 Nov 2017 17:40:37 +0200")
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux)

I am sure you can find some who would want something that ignores replicated
GTIDs that duplicate GTIDs originating locally.

I can only say that my experience is that this can cause unexpected
problems, and requires a lot of thought to get a well-defined semantics that
users can understand and will not bring surprises.

A central design decision for MariaDB GTID is _not_ to try to remember the
whole history of GTIDs applied, unlike MySQL GTID. Because of this there are
limitations to what can be done in terms of avoiding duplicate GTIDs - the
server lacks the required information.

Another decision was to allow and handle correctly out-of-order sequence
numbers (eg. gtid_strict_mode=0). This was necessary to be able to generate
GTIDs by default in 10.0. But it again means that detecting duplicates is
harder, and in fact only the master has the required information to do this,
the slave does not (in the general case).

Finally, experience has shown that a _lot_ of users get problems when
locally done transactions on a slave influence the slave's GTID position. In
retrospect, I have realised that CHANGE MASTER TO
master_use_gtid=current_pos was a mistake, only slave_pos should be used.
Similarly, if a local transaction in a slave's binlog can cause transactions
from the master to be silently ignored, it will cause a lot of grief for
users.

Hope this helps,

 - Kristian.

andrei.elkin@xxxxxxxxxx writes:

> Kristian, howdy.
>
> Thanks for a simple CHANGE MASTER ... IGNORE_SERVER_IDS
> that you remind us about! (This time evaded myself alone :-))
> It perfectly covers a cluster circular case.
>
> What motivated me to consider this option for looking for duplicates
> also in gtid_binlog_pos was the following observation.
>
> A duplicate gtid (transaction) can also arrive from a separate session
> of the same server but in this case the gtid_ignore_duplicates rules
> do not apply. Such gtid would silently override an existing.
>
> On the other hand gtid_strict_mode applies to either the ordinary server
> and the slave (by the docs).
>
>    MariaDB [test]> show global variables like 'gtid_binlog_pos';
>    +-----------------+--------+
>    | Variable_name   | Value  |
>    +-----------------+--------+
>    | gtid_binlog_pos | 0-1-12 |
>    +-----------------+--------+
>    1 row in set (0.00 sec)
>
>    MariaDB [test]> set @@session.gtid_seq_no=11;
>    ERROR 1950 (HY000): An attempt was made to binlog GTID 0-1-11 which
>    would create an out-of-order sequence number with existing GTID 0-1-12,
>
> Maybe it would not a bad idea to generalize the gtid_ignore_duplicates to
> cover any source duplicate which would become effectively a "soft" mode to silently
> ... reject.
>
> In other words how about extending a gtid (operational) mode as a set to
>
> "gtid_mode" \in {
>                   on     (override by dups),
>                   strict (error out dups)
> +               , soft   (ignore dups)
>                 }
>
> To other subjects,
>
>> Sachin Setiya <sachin.setiya@xxxxxxxxxxx> writes:
>>
>>> I have some question related to rpl_slave_state. Suppose A circular
>>> async replication between A < -- > B (gtid_ignore_duplicates on)
>>
>> Why do you set gtid_ignore_duplicates? This option is for multi-source
>> replication:
>>
>>   https://mariadb.com/kb/en/library/gtid/#gtid_ignore_duplicates
>>
>>   "When set, different master connections in multi-source replication are
>>   allowed to receive and process event groups with the same GTID"
>>
>> But you are not using multi-source connection here, there is only one master
>> connection (eg. connection to B on slave A).
>> Thus, the option will do nothing in this case.
>>
>>> Now, we set some temp server_id on server A , lets say `X`. Now the
>>> problem is each event group which
>>> originates from A is executed 2 times. For example we insert into
>>> table t1 and gtid is 0-X-2. The event goes to slave B
>>> B applies it, And send it back to A, Since its server_is different
>>
>> I think here you mean that A has server_id=1 (eg), B has server_id=2, but on
>> A you do
>>
>>   SET server_id=3;
>>   INSERT INTO t1 VALUES (1);
>>
>> But there is no server with server_id=3 anywhere. In this case, you need to
>> break the circle yourself somewhere. For example by CHANGE MASTER ...
>> IGNORE_SERVER_IDS=3 on A.
>>
>> To my knowledge, this has always been so for ring replication.
>>
>>> Andrei suggested a solution of checking rpl_binlog_state in
>>> check_duplicate_gtid, This solution solves some problem but creates
>>
>> It seems you think that --gtid-ignore-duplicates should magically ignore any
>> apply of duplicate GTID. But that is not the case, as the documentation
>> states (though admittedly rather briefly). --gtid-ignore-duplicates is
>> _only_ for multi-source replication (so perhaps unfortunately named).
>>
>> In this case, the conflict is not between GTIDs replicated from different
>> master connections. It is a conflict between a transaction originated on a
>> master with a transaction replicated from another master.
>>
>>> write gtid_event in log. But this does not make sense. rpl_slave_state
>>> should be used for slave replication usage.
>>
>> Agree. rpl_binlog_state should not be involved in slave GTID processing.
>> There should be a clear separation: rpl_slave_state is what a slave has
>> applied from another master. rpl_binlog state is what a master has
>> originated.
>>
>> The gtid_ignore_duplicates option is already very difficult for users to
>> understand and use correctly. It would be a mistake to make it even more
>> complicated.
>>
>> Also, this seems to originate from some Galera issue. It is well known that
>> Galera was merged prematurely into MariaDB with a broken design, and this
>> was never fixed. Galera issues must never influence how non-galera
>> replication (which at least attempts to have a proper design) works.
>
> I would support this.
>
> Cheers,
>
> Andrei
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~maria-developers
> Post to     : maria-developers@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~maria-developers
> More help   : https://help.launchpad.net/ListHelp

References

Interaction between rpl_slave_state and rpl_binlog_state
From: Sachin Setiya, 2017-11-28
Re: Interaction between rpl_slave_state and rpl_binlog_state
From: Kristian Nielsen, 2017-11-28
Re: Interaction between rpl_slave_state and rpl_binlog_state
From: andrei . elkin, 2017-11-28