← Back to team overview

maria-discuss team mailing list archive

Re: Galera cluster with asynchronous slave

 


On 04. juli 2014 17:24, erkan yanar wrote:
On Fri, Jul 04, 2014 at 02:56:56PM +0200, Johnny Antonsen wrote:
On 04. juli 2014 10:44, erkan yanar wrote:
Ahoi Johnny,
Ahoi there :)
On Thu, Jul 03, 2014 at 02:16:26PM +0200, Johnny Antonsen wrote:
Got fatal error 1236 from master when reading data from binary log:
'Error: connecting slave requested to start from GTID 3-1-422, which
is not in the master's binlog'


And the Slave_IO_State shows that it's no longer in sync.

I have run SELECT @@GLOBAL.gtid_slave_pos; to check what the current
GTID for each node is, and they all return: 1-1-2145, however,
sometimes if I add a lot of data, that value is different on some
nodes, which is why I think the slave gets confused.
Using Galera there is no different Data on the nodes.

On the slave, when activating using_gtid=slave_pos, the following
gtid_IO_pos appear: 1-1-2464,2-3-420,3-1-422
Why are you using different domain-ids?
 From what this documentation says, it is recommended to use
different domain-ids
https://mariadb.com/kb/en/mariadb/mariadb-documentation/replication-cluster-multi-master/replication/global-transaction-id/#use-with-multi-source-replication-and-other-multi-master-setups

Here it says " In such setups, each active master must be configured
with its own distinct replication domain ID, gtid_domain_id. The
binlog will then in effect consists of multiple independent streams,
one per active master. Within one replication domain, binlog order
is always the same on every server."
Galera orders your commits. You don't want to have your transactions ordered
per domain-id. You want them to be ordered on all nodes.
So just to be clear
server1 - server-id 1 and gtid_domain_id 1
server2 - server-id 2 and gtid_domain_id 1

Am I on the right track?

And as I'm trying to run a slave from multiple masters, this relates
to my current setup doesn't it?

 From what I have read, this should be somewhat correct, as the first
value is the server id. However, in the config I have specified that
node 1 has server id 1, node 2 has id 2 and so on, and that the same
goes for gtid_domain_id. Is this the correct setup or do the nodes
need to have the same server-id or gtid_domain_id?
The secound value is the server-id.
Ok, so that means that each value on the various servers in a galera
clusters will be unique, like node 1 will have gtid 1-1-xxx and node
2 will have 1-2-xxx and so on? According to what you mention further
up about domain id's being unique.
The important point is the third part.
The monotonically increasing sequence number.

Surely there must be a good way to solve this? Is the system not
built to handle an asynchronous slave replicating from one random
node?

I don't know what you are doing.
All I can say Im doing also MariaDB GTID slaves and it works.
Even Im not sure if domain-id matters - I haven't set them at all - be sure
to have log_slave_updates and bin_log enabled.
What I'm trying to do is actually pretty simple when you think about
it. I have three servers running mariadb and being in a galera
cluster. Each server has haproxy and keepalived running to move a
virtual ip over and haproxy for checking if the actual service is up
and running. On another site I have a mariadb server running with
master set to the virtual ip assigned by keepalived. All this server
has to do is replicate data from the mysql server it reaches once it
connects.

This works fine when it reaches the first server, but once it jumps
to the next server I get a message saying that the GTID is not in
the current binlog. The using_gtid value is set to slave_pos.
So have you checked if the events are in the binlog?
Yes, I did check the binlog for further details on the events, and from what I can see the events show up on each galera server. On the async slave however, the replication seems to catch up and sync with server1 once the slave has been stopped, reset and started, but when it jumps to Master_Server_Id: 2 it fails out with the following message: Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Could not find first log file name in binary log index file'

And then it stops running until I reset it. I have found some results online on the error, but they all either refer to mysql and they do not use gtid. Which means they simply redefine the binlog file and position manually before starting the slave. This however defeats the purpose of using GTID from what I've understood.

log_slave_updates is enabled on all three servers running galera,
and so is binlog using ROW.


Hope this explains a little more on what I'm trying to achieve.
Thats what I do myself. Right now without a VIP, just doing a change ḿaster.
No problem at all.
So how do you automate the change master process? I'm guessing going through the VIP for replicating doesn't seem to work for me, so a little hint on how to do this process with change master would be great help towards solving my setup.

Regards
Erkan




References