← Back to team overview

maria-discuss team mailing list archive

Re: Fix different gtid-positions on domain 0 in multi-master

 

Reinder Cuperus <reinder@xxxxxxxxxxxxxx> writes:

> The problem is, as soon as I stop that connection, that master2 and
> master3 have different gtid-positions for domain0, and stop/start on
> replication master3->backup results in the error:
> "Got fatal error 1236 from master when reading data from binary log:
> 'Error: connecting slave requested to start from GTID 0-1-3898746614,
> which is not in the master's binlog'

Yes. backup sees that it is ahead of master3 in domain 0, so it aborts to
avoid risk of diverging replication.

> I have tried moving master1/2 to domain_id:1, and removing the
> domain_id:0 from the gtid_slave_pos on backup, but starting the
> replication master2->backup results in the error:
> Got fatal error 1236 from master when reading data from binary log:
> 'Could not find GTID state requested by slave in any binlog files.
> Probably the slave state is too old and required binlog files have been
> purged.'

Yes. Because backup now sees that it is far behind in domain 0 (as it sees
the world), and aborts to not silently lose transactions.

> I tried finding a way to purge domain:0 from master3/master4, but the
> only way sofar I have found is doing a "RESET MASTER" on master3, which
> would break replication between master3 and master4.

Yes, I guess this is what you need. You have made a copy and removed half of
the data, and now you need to similarly remove half of the binlog. Even if
there are no actual transactions left from a domain in non-purged binlogs,
the binlogs still remember the history of all domains, in order to not
silently lose transactions for a slave that gets far behind.

It would be useful in general to be able to purge a domain from a binlog.
But currently the only way I can think of is RESET MASTER.

You can see how this binlog history looks by checking @@gtid_binlog_state,
and in the GTID_LIST events at the head of each binlog file.

> I have tried to find a way to insert an empty transaction, with the last
> gtid on domain_id:0 on the master3, to bring master2/master3 in sync
> again on that domain, but I could not find a way to do that on MariaDB.

The server will not binlog an empty transaction, but a dummy transaction
should work, eg. create and drop a dummy table, for example:

  CREATE TABLE dummy_table (a INT PRIMARY KEY);
  SET gtid_domain_id= 0;
  SET gtid_server_id= 1;
  SET gtid_seq_no= 3898746614;
  DROP TABLE dummy_table;

Maybe this way you can make the binlogs look like they are in sync to the
replication, not sure. It might be tricky, but then you do seem to have a
good grasp of the various issues involved.

> Are there other ways to fix this issue, so I can have reliable
> replication master3->backup without having to keep the dummy replication
> backup->master3 indefinitely?

I guess you would need to stop traffic to master3/master4 while getting them
in sync with one another and the do RESET MASTER on both and SET GLOBAL
gtid_slave_pos="" to start replication from scratch. You would then also
need to have server 'backup' up-to-date with master3 before RESET MASTER,
and remove domain id 20 from the gtid_slave_pos on backup after the RESET
MASTER.

So that is quite intrusive.

 - Kristian.


Follow ups

References