maria-developers team mailing list archive

Thread
Date

Re: understanding commit and prepare APIs in MariaDB 5.5

To: Zardosht Kasheff <zardosht@xxxxxxxxx>
From: Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
Date: Thu, 21 Feb 2013 15:11:38 +0100
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx
In-reply-to: <CABFd+SFNJRmPSBciCtX2UqWKm50tBqqXy3XnFScQyd_9q67Obw@mail.gmail.com> (Zardosht Kasheff's message of "Thu, 21 Feb 2013 07:37:28 -0500")
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.4 (gnu/linux)

Zardosht Kasheff <zardosht@xxxxxxxxx> writes:

> Here is a very high level overview.

Thanks for the detailed explanation!

I think an easy way to implement commit_ordered() is that all you do is
increment a counter and assign it to the transaction. Then in commit(), each
transaction waits for the previous commit to write to recovery log before
writing itself (so the order becomes correct). That should be a very small
modification to your code. But there may be extra context switching, unless
you are clever with your write lock access to the recovery log.

Maybe there is a different possibility. If I understand correctly, this is the
situation:

 - You need part of commit to run in parallel in multiple threads for good
   performance ("send a message into the dictionary for every ...").

 - The first phase of checkpointing needs to stall all commits while it runs
   (but it is short).

 - When a checkpoint is waiting to start, you also need to stall new commits,
   to prevent starvation of checkpointing.

Is this accurate?

In fact, such a situation is exactly why I did the split in commit_ordered()
and commit(). So that a storage engine can have freedom to choose which part
of commit should run serially, and which in parallel.

It seems to me that the problem here is that you are using a simple read lock
to handle the stalling and avoid starvation. And your read lock implementation
does not allow to take the read lock in one thread and release it in another
(which is reasonable). Maybe it can be solved simply by just using a different
mechanism?

Like, keep a counter of threads running inside commit. When a checkpoint is
about to start, set a flag, "checkpoint pending", then wait for counter to
drop to zero. When a new thread wants to commit, wait for the "checkpoint
pending" flag to clear, stalling the commit until checkpoint has completed.

Note that it is not a problem to do the wait for checkpoint complete inside
commit_ordered(). Yes, this is single-threaded, but all other committing
threads will have to wait anyway, so in fact doing the wait just in the one
thread will reduce context switches and speed up things. But you could do the
wait for checkpoint to complete in eg. prepare() instead if you want.

But maybe I am missing something? Not knowing your implementation, I cannot
know of course if this naive second idea is infeasible for some reason...

> may be expensive, thereby hurting concurrency. So, for such a thing to
> work, we would have to find a way to grab the read lock in
> commit_ordered once for each transaction (and because the lock is
> fair, we can't just regrab the lock on the same thread), write to the
> recovery log, then perform everything else under commit, then release
> the read lock. It can probably be done, but it is messy. If
> unnecessary, I prefer to not do it.

Yes, I understand, deep surgery on synchronisation primitives in the core of
an engine is not trivial stuff...

In MariaDB, it is not necessary. The commit_ordered() is optional, though it
gives you some benefits (and likely more benefits in future versions). If
necessary, we can try make eg. the removal of commit fsync() work without
commit_ordered(), so you get some of the benefits regardless.

And it sounds like my first suggestion should be an easy way to implement
commit_ordered(). Though it might require benchmarking to check that it does
not hurt performance. If you try any solution, feel free to send me the patch
for review and suggestions.

But what can you do in MySQL 5.6? In 5.6, effectively what you get is
commit_ordered() only, no commit() (their call of commit() with
HA_IGNORE_DURABILITY set is essentially the same as commit_ordered()).

So you do not get to decide which code to run serially, and which in parallel.
Everything in commit() runs serially. Total breakage of the storage
engine API, and their developers do not even understand this when pointed out
to them :-(

I vaguely remember some option in 5.6 that would disable the serialisation of
commit(), maybe you can recommend your users to enable that ...

 - Kristian (painfully aware of writing too long emails).

References

understanding commit and prepare APIs in MariaDB 5.5
From: Zardosht Kasheff, 2013-02-21
Re: understanding commit and prepare APIs in MariaDB 5.5
From: Kristian Nielsen, 2013-02-21
Re: understanding commit and prepare APIs in MariaDB 5.5
From: Zardosht Kasheff, 2013-02-21