maria-developers team mailing list archive

Thread
Date

Re: Semisync plugin incompatibility

To: Pavel Ivanov <pivanof@xxxxxxxxxx>, Sergei Golubchik <serg@xxxxxxxxxxxx>
From: Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
Date: Thu, 14 Nov 2013 11:08:41 +0100
Cc: maria-developers <maria-developers@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <CAAG=WUu0Yj57VhH221tTnncZivoeAUwQ48aE=V7WcvtO5CK+XQ@mail.gmail.com> (Pavel Ivanov's message of "Mon, 11 Nov 2013 10:01:49 -0800")
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.4 (gnu/linux)

Pavel Ivanov <pivanof@xxxxxxxxxx> writes:

> So basically my question is: if I prepare a patch that will restore
> the original behavior of semi-sync replication (and remove the tests
> added for Bug#45672) will that be acceptable for MariaDB?

I don't have anything against it, as I said I do not have much opinion on
semi-sync one way or the other.

But I would like to hear at least one other opinion (Serg maybe?)

And I think you should write up a full description of how semi-sync should
work with respect to error handling and disconnecting slaves. So that we have
a complete, logical picture into which your patch fits.

> For our use case we want clients to always see error when slaves
> didn't ack the transaction. This basically allows us to have a general
> rule: "Clients can rely on durability of only those transactions which
> they received the "success" result on". I.e. all transactions that
> were committed locally but didn't receive semi-sync ack are ok to lose
> later, and that won't be a serious offense on MySQL side. Of course
> "enhanced semi-sync replication" will help with this a lot and we'll
> be really happy to have it. But without it we at least don't want
> semi_sync_master to turn itself off ever.

I agree that the fact that semi_sync turns off itself seems stupid.

And it clearly would be highly desirable that client can know of the failure
of semi-sync.

The problem here is that the transaction _is_ committed locally. If we return
an error, we are confusing all existing applications that expect an error
return from commit to mean that the transaction is guaranteed _not_ to be
committed. Did you consider this issue, and possible different ways to solve
your problem that would not have this issue?

For example:

 - The client could receive a warning, rather than an error. The warning could
   be handled by those applications that are interested.

 - The master could kill the client connection rather than return the
   error. This matches the normal ACID expectations: If commit returns ok then
   transaction is durable. If it returns error then transaction is not
   committed. If it does not return (connection lost), then it is unknown if
   the transaction is committed or not.

 - The master could check during the prepare phase if any slaves are
   connected. If not, the transaction could be rolled back and a normal error
   returned to the client.

 - The master could crash itself, causing promotion of a new master, which
   then could involve checking all replication servers to find the one that is
   most advanced.

 - The master could truncate the current binlog file to before the offending
   transaction and roll back the InnoDB changes. Of course, since this is not
   true synchronous replication, this leaves the possibility that the
   transaction exists on a slave but not on the master.

> This not only leaves the client unaware of the problem, but also
> allows the server to accept transactions from clients at a very high
> rate when no slaves are present. And if then machine with master fails
> all those accepted transactions will be permanently lost. So in the
> situation when master doesn't have slaves we want to slow down clients
> as much as possible even though their transactions will be committed
> locally and they will be able to check with SELECTs that transactions
> are actually committed.

So you expect every application to implement error handling for every update
that does some SELECTs to check if their transaction was committed or not?
That sounds very specialised, surely not something to be expected in general.
(But why even do such SELECTs? The client could just check the error code, if
it is "semisync error" then the transaction is committed locally, else it is
not).

I still do not understand how the client will handle the error in your
scenario. I think it would clarify things if you could explain this in
detail. Eg. explain the original problem you are trying to solve, rather than
your proposed solution.

 - Kristian.

Follow ups

Re: Semisync plugin incompatibility
From: Pavel Ivanov, 2013-11-15
Re: Semisync plugin incompatibility
From: Sergei Golubchik, 2013-11-14

References

Re: Semisync plugin incompatibility
From: Kristian Nielsen, 2013-11-11
Re: Semisync plugin incompatibility
From: Pavel Ivanov, 2013-11-11