maria-developers team mailing list archive

Thread
Date
Re: MariaDB Galera replication

To: Alex Yurchenko <alexey.yurchenko@xxxxxxxxxxxxx>
From: Pavel Ivanov <pivanof@xxxxxxxxxx>
Date: Fri, 15 Nov 2013 13:59:49 -0800
Cc: maria-developers <maria-developers@xxxxxxxxxxxxxxxxxxx>
I'm starting a new thread as this is already doesn't have anything to
do with the original topic.

On Fri, Nov 15, 2013 at 10:46 AM, Alex Yurchenko
<alexey.yurchenko@xxxxxxxxxxxxx> wrote:
>>> Please pardon this arrogant interruption of your discussion and shameless
>>> self-promotion, but I just could not help noticing that Galera
>>> replication
>>> was designed specifically with these goals in mind. And it does seem to
>>> achieve them better than semi-sync plugin. Have you considered Galera?
>>> What
>>> makes you prefer semi-sync over Galera, if I may ask?
>>
>>
>> To be honest I never looked at how Galera works before. I've looked at
>> it now and I don't see how it can fit with us. The major disadvantages
>> I immediately see:
>> 1. Synchronous replication. That means client must wait while
>> transaction is applied on all nodes which is unacceptably big latency
>> of each transaction. And what if there's a network blip and some node
>> becomes inaccessible? All writes will just freeze? I see the statement
>> that "failed nodes automatically excluded from the cluster", but to do
>> that cluster must wait for some timeout in case it's indeed a network
>> blip and node will "quickly" reconnect. And every client must wait for
>> cluster to decide what happened with that one node.
>> 2. Let's say node fell out of the cluster for 5 minutes and then
>> reconnected. I guess it will be treated as "new node", it will
>> generate state transfer and the node will start downloading the whole
>> database? And while it's trying to download say 500GB of data files
>> all other nodes (or maybe just donor?) won't be able to change those
>> files locally and thus will blow up its memory consumption. That means
>> they could quickly run out of memory and "new node" won't be able to
>> finish its "initialization"...
>> 3. It looks like there's strong asymmetry in starting cluster nodes --
>> the first one should be started with empty wsrep_cluster_address and
>> all others should be started with the address of the first node. So I
>> can't start all nodes uniformly and then issue some commands to
>> connect them to each other. That's bad.
>> 4. What's the transition path? How do I upgrade MySQL/MariaDB
>> replicating using usual replication to Galera? It looks like there's
>> no such path and the solution is stop the world using regular
>> replication and restart it using Galera. Sorry I can't do that with
>> our production systems.
>>
>> I believe these problems are severe enough for us, so that we can't
>> work with Galera.
>
>
> Pavel, you seem to be terribly mistaken on almost all accounts:
>
> 1. *Replication* (i.e. data buffer copying) is indeed synchronous. But
> nobody said that commit is. What Galera does is very similar to semi-sync,
> except that it does it technically better. I would not dare to suggest
> Galera replication if I didn't believe it to be superior to semi-sync in
> every respect.

Well, apparently we have a different understanding of what the term
"synchronous replication" means. This term is all over the Galera doc,
but I didn't find the detailed description of how actually Galera
replication work. So I assumed that my understanding of the term
(which actually seem to be in line with wiki's definitions
http://en.wikipedia.org/wiki/Replication_(computing) ) is what was
implied there. So I hope you'll be able to describe in detail how
Galera replication works.

> As an example here's an independent comparison of Galera vs.
> semi-sync performance:
> http://linsenraum.de/erkules/2011/06/momentum-galera.html.

This is a nice blog post written in German and posted in 2011. And
while Google Translate gave me an idea what post was about it would be
nice to see something more recent and with better description of what
was the actual testing set up.

> In fact, majority
> of Galera users migrated from the regular *asynchronous* MySQL replication,
> which I think is a testimony to Galera performance.

I don't mean to troll, but this can also mean that everyone who
migrated didn't care much about performance and Galera's performance
was within sane boundaries...

BTW, just found here
https://mariadb.com/kb/en/mariadb-galera-cluster-known-limitations/ :
"by design performance of the cluster cannot be higher than
performance of the slowest node; however, even if you have only one
node, its performance can be considerably lower comparing to running
the same server in a standalone mode". That contradicts your words.

> 2. Node reconnecting to cluster will normally receive only events that it
> missed while being disconnected.

This seem to contradict to the docs. Again from
https://mariadb.com/kb/en/mariadb-galera-cluster-known-limitations/ :
"After a temporary split, if the 'good' part of the cluster was still
reachable and its state was modified, resynchronization occurs".

> 3. You are partially right about it, but isn't it much different from
> regular MySQL replication where you first need to set up master and then
> connect slaves (even if you have physically launched the servers at the same
> time).

Operation of setting up master and then connecting slaves consists of
mostly only executing CHANGE MASTER TO and then START SLAVE on all
slaves after all MySQL instances (including master) were started with
the same set of command line flags. This is fundamentally different
from starting instances with different arguments, especially when
these arguments should be different depending on whether the replica
is starting first or there's already some other replica running.

> Yet, Galera nodes can be started simultaneously and then joined
> together by setting wsrep_cluster_address from mysql client connection. This
> is not advertised method, because in that case state snapshot transfer can
> be done only by mysqldump. If you set the address in advance, rsync or
> xtrabackup can be used to provision the fresh node.

This is of course better because I can start all instances with the
same command line arguments. But transferring snapshot of a very big
database using mysqldump, and causing the node that creates mysqldump
to blow up memory consumption during the process, that is still a big
problem.

> 4. Every Galera node can perfectly work as either master or slave to native
> MySQL replication. So migration path is quite clear.

Nope, not clear yet. So I'll be able to upgrade all my MySQL instances
to a Galera-supporting binary while they are replicating using
standard MySQL replication. That's good. Now, how the Galera
replication is turned on after that? What will happen if I just set
wsrep_cluster_address address on all replicas? What will replicas do,
and what will happen with the standard MySQL replication?

> It is very sad that you happen to have such gross misconceptions about
> Galera. If those were true, how would MariaDB Galera Cluster get paying
> customers?

Care to share some numbers? Like what's the rough amount of those
paying customers? What size is the biggest installation -- number of
clusters, replicas, highest QPS load?
I'm not asking to share any confidential information, but the rough
ballpark of the numbers would be helpful.

> May be my reply will convince you to have a second look at it.
> (In addition to the above Galera is fully multi-master, does parallel
> applying and works great in WAN)

I hope your explanation of how Galera replication work will help me
understand how great it works over WAN and how you could make full
multi-master work without fully synchronous replication in my
understanding of that term.


Pavel
Follow ups

Re: MariaDB Galera replication
From: erkan yanar, 2013-11-17
Re: MariaDB Galera replication
From: Alex Yurchenko, 2013-11-16