← Back to team overview

maria-discuss team mailing list archive

Fix different gtid-positions on domain 0 in multi-master

 

Hello,

After moving half of the databases on our primary master-master-cluster
to a different cluster, we have a problem on our backup-server which is
now a slave of both servers.

Topology before:
master1 <-> master2 -> backup-slave

Topology during migrations:
master1 <-> master2 -> backup-slave <-> master3 <-> master4

Final topology:
master1 <-> master2 -> backup-slave <- master3 <-> master master4


The gtid_domain_ids before/during the migration were:
master1/master2 : 0
master3/master4 : 20

Replication-settings:
master2 -> backup: Replicate_Do_Domain_Ids: 0
master3 -> backup: Replicate_Do_Domain_Ids: 20
backup->master3: Replicate_Do_DB: DbToMigrate1,DbToMigrate2,etc

Server version: 10.1.9

So after the migration, a typical gtid_position would be on:
master1/2: 0-1-12345
master3/4: 0-1-12345,20-21-5678

As long as I keep the connection backup->master running (safe, because
no new transactions on the migrated databases are occurring anymore on
master1/2), the position on domain 0 gets recorded on master3.
The problem is, as soon as I stop that connection, that master2 and
master3 have different gtid-positions for domain0, and stop/start on
replication master3->backup results in the error:
"Got fatal error 1236 from master when reading data from binary log:
'Error: connecting slave requested to start from GTID 0-1-3898746614,
which is not in the master's binlog'

I have tried moving master1/2 to domain_id:1, and removing the
domain_id:0 from the gtid_slave_pos on backup, but starting the
replication master2->backup results in the error:
Got fatal error 1236 from master when reading data from binary log:
'Could not find GTID state requested by slave in any binlog files.
Probably the slave state is too old and required binlog files have been
purged.'

I have tried to find a way to insert an empty transaction, with the last
gtid on domain_id:0 on the master3, to bring master2/master3 in sync
again on that domain, but I could not find a way to do that on MariaDB.

I tried finding a way to purge domain:0 from master3/master4, but the
only way sofar I have found is doing a "RESET MASTER" on master3, which
would break replication between master3 and master4.

Are there other ways to fix this issue, so I can have reliable
replication master3->backup without having to keep the dummy replication
backup->master3 indefinitely?

Regards,
Reinder Cuperus


Follow ups