maria-developers team mailing list archive
Mailing list archive
Re: Semisync plugin incompatibility
Let me try to explain and maybe answer most of your questions.
Semi-sync replication for us is a DBA tool that helps to achieve
durability of transactions in the world where MySQL doesn't do any
flushes to disk. As you may guess by removing disk flushes we can
achieve a very high transaction throughput. Plus if we accept the
reality that disks can fail and repairing information from it is
time-consuming and expensive (if at all possible), with such reality
you can realize that flush or no flush there's no durability if disk
fails, and thus disk flushes don't make much sense. So to get
durability we use semi-sync. And definition of "durability" in this
case is "if client gets ok on the transaction he will find this data
after that". And that should stand in case of any master failures and
failovers. If we set semi_sync_master_timeout = infinity we get
something that is very close to that kind of durability. Yes there is
a problem that while one connection is waiting for semi-sync ack
another one can already see the data committed. And if the first
client doesn't ever receive "ok" from the transaction then we can
consider it non-existent and we can safely "lose" it during failover.
And that will confuse the second client a lot (the data he was seeing
suddenly disappears). That's a trade-off we are ready to accept.
It looks like MySQL 5.7.2 already implements another way of semi-sync
replication when transaction is not visible to other connections until
it's semi-sync ack'ed
We will be happy to try that. But it has another trade-off that could
be hard to accept sometimes -- InnoDB releases all row locks only when
semi-sync ack is received. And that could slow down inter-dependent
So that's how we look at the semi-sync replication. BTW, digging
through some history I've realized that semi-sync plugins in MariaDB
look very close to how semi-sync patch looked like at Google in 2008.
Apparently back then it was included into MySQL, but then it evolved
here and all the changes already didn't make it to upstream.
Now to your questions.
> The problem here is that the transaction _is_ committed locally. If we return
> an error, we are confusing all existing applications that expect an error
> return from commit to mean that the transaction is guaranteed _not_ to be
> committed. Did you consider this issue, and possible different ways to solve
> your problem that would not have this issue?
> For example:
> - The client could receive a warning, rather than an error. The warning could
> be handled by those applications that are interested.
As I said above semi-sync replication is a DBA tool, so it's not up to
application to be interested in it or not. It's up to DBAs to make
sure that application developers don't get feeling that they have lost
some data. DBAs should be able to guarantee durability even if it's
with some constraints in usage.
> - The master could kill the client connection rather than return the
> error. This matches the normal ACID expectations: If commit returns ok then
> transaction is durable. If it returns error then transaction is not
> committed. If it does not return (connection lost), then it is unknown if
> the transaction is committed or not.
I think this makes sense. And this is actually how we use semi-sync
now -- we use it only with semi_sync_master_timeout = infinity, i.e.
connection either gets semi-sync ack or gets killed (or gets
> - The master could check during the prepare phase if any slaves are
> connected. If not, the transaction could be rolled back and a normal error
> returned to the client.
This is racy and basically introduces complexity to the code without
eliminating the situation when transaction is committed, but client
gets error. So overall I'm not sure this is worth it.
> - The master could crash itself, causing promotion of a new master, which
> then could involve checking all replication servers to find the one that is
> most advanced.
This is the scariest proposition of all. Deliberate crash in
production can lead to higher than necessary periods of service
> - The master could truncate the current binlog file to before the offending
> transaction and roll back the InnoDB changes. Of course, since this is not
> true synchronous replication, this leaves the possibility that the
> transaction exists on a slave but not on the master.
This is actually what https://mariadb.atlassian.net/browse/MDEV-162
(and probably MySQL 5.7.2 implementation) is about, right?
I hope our view of the way how semi-sync replication should work is
clear to you now.