maria-discuss team mailing list archive

Thread
Date

Re: Galera cluster with asynchronous slave

To: erkan yanar <erkan.yanar@xxxxxxxxxxxxx>
From: Johnny Antonsen <johnny@xxxxxxxxx>
Date: Mon, 21 Jul 2014 15:14:38 +0200
Cc: maria-discuss@xxxxxxxxxxxxxxxxxxx
In-reply-to: <1X35Lo-00024z-V5@linsenraum.de>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0


On 04. juli 2014 17:24, erkan yanar wrote:

On Fri, Jul 04, 2014 at 02:56:56PM +0200, Johnny Antonsen wrote:

On 04. juli 2014 10:44, erkan yanar wrote:

Ahoi Johnny,

Ahoi there :)

On Thu, Jul 03, 2014 at 02:16:26PM +0200, Johnny Antonsen wrote:

Got fatal error 1236 from master when reading data from binary log:
'Error: connecting slave requested to start from GTID 3-1-422, which
is not in the master's binlog'


And the Slave_IO_State shows that it's no longer in sync.

I have run SELECT @@GLOBAL.gtid_slave_pos; to check what the current
GTID for each node is, and they all return: 1-1-2145, however,
sometimes if I add a lot of data, that value is different on some
nodes, which is why I think the slave gets confused.

Using Galera there is no different Data on the nodes.

On the slave, when activating using_gtid=slave_pos, the following
gtid_IO_pos appear: 1-1-2464,2-3-420,3-1-422

Why are you using different domain-ids?

 From what this documentation says, it is recommended to use
different domain-ids
https://mariadb.com/kb/en/mariadb/mariadb-documentation/replication-cluster-multi-master/replication/global-transaction-id/#use-with-multi-source-replication-and-other-multi-master-setups

Here it says " In such setups, each active master must be configured
with its own distinct replication domain ID, gtid_domain_id. The
binlog will then in effect consists of multiple independent streams,
one per active master. Within one replication domain, binlog order
is always the same on every server."

Galera orders your commits. You don't want to have your transactions ordered
per domain-id. You want them to be ordered on all nodes.

So just to be clear
server1 - server-id 1 and gtid_domain_id 1
server2 - server-id 2 and gtid_domain_id 1

Am I on the right track?

And as I'm trying to run a slave from multiple masters, this relates
to my current setup doesn't it?

 From what I have read, this should be somewhat correct, as the first
value is the server id. However, in the config I have specified that
node 1 has server id 1, node 2 has id 2 and so on, and that the same
goes for gtid_domain_id. Is this the correct setup or do the nodes
need to have the same server-id or gtid_domain_id?

The secound value is the server-id.

Ok, so that means that each value on the various servers in a galera
clusters will be unique, like node 1 will have gtid 1-1-xxx and node
2 will have 1-2-xxx and so on? According to what you mention further
up about domain id's being unique.

The important point is the third part.
The monotonically increasing sequence number.

Surely there must be a good way to solve this? Is the system not
built to handle an asynchronous slave replicating from one random
node?

I don't know what you are doing.
All I can say Im doing also MariaDB GTID slaves and it works.
Even Im not sure if domain-id matters - I haven't set them at all - be sure
to have log_slave_updates and bin_log enabled.

What I'm trying to do is actually pretty simple when you think about
it. I have three servers running mariadb and being in a galera
cluster. Each server has haproxy and keepalived running to move a
virtual ip over and haproxy for checking if the actual service is up
and running. On another site I have a mariadb server running with
master set to the virtual ip assigned by keepalived. All this server
has to do is replicate data from the mysql server it reaches once it
connects.

This works fine when it reaches the first server, but once it jumps
to the next server I get a message saying that the GTID is not in
the current binlog. The using_gtid value is set to slave_pos.

So have you checked if the events are in the binlog?

Yes, I did check the binlog for further details on the events, and fromwhat I can see the events show up on each galera server. On the asyncslave however, the replication seems to catch up and sync with server1once the slave has been stopped, reset and started, but when it jumps toMaster_Server_Id: 2 it fails out with the following message:Last_IO_Error: Got fatal error 1236 from master when reading data frombinary log: 'Could not find first log file name in binary log index file'

And then it stops running until I reset it. I have found some resultsonline on the error, but they all either refer to mysql and they do notuse gtid. Which means they simply redefine the binlog file and positionmanually before starting the slave. This however defeats the purpose ofusing GTID from what I've understood.

log_slave_updates is enabled on all three servers running galera,
and so is binlog using ROW.


Hope this explains a little more on what I'm trying to achieve.

Thats what I do myself. Right now without a VIP, just doing a change ḿaster.
No problem at all.

So how do you automate the change master process? I'm guessing goingthrough the VIP for replicating doesn't seem to work for me, so a littlehint on how to do this process with change master would be great helptowards solving my setup.


Regards
Erkan

References

Galera cluster with asynchronous slave
From: Johnny Antonsen, 2014-07-03
Re: Galera cluster with asynchronous slave
From: erkan yanar, 2014-07-04
Re: Galera cluster with asynchronous slave
From: Johnny Antonsen, 2014-07-04
Re: Galera cluster with asynchronous slave
From: erkan yanar, 2014-07-04