maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #03572
Re: Architecture review of MWL#116 "Efficient group commit for binary log"
Hi, Kristian!
On Sep 06, Kristian Nielsen wrote:
> However, as I revisited the algorithm, it occured to me that it is in
> any case better to wake up threads individually as soon as
> commit_ordered() has finished. This way, the first threads in the
> queue are free to continue doing useful work while we are still
> running commit_ordered() for the last threads.
>
> So now the algorithm is something like this:
>
> thd->ready= false
> lock(LOCK_prepare_ordered)
> old_queue= group_commit_queue
> thd->next= old_queue
> group_commit_queue= thd
> ht->prepare_ordered()
> unlock(LOCK_prepare_ordered)
>
> if (old_queue == NULL) // leader?
> lock(LOCK_group_commit)
>
> lock(LOCK_prepare_ordered)
> queue= reverse(group_commit_queue)
> group_commit_queue= NULL
> unlock(LOCK_prepare_ordered)
>
> group_log_xid(queue)
>
> lock(LOCK_commit_ordered) // but see below
> unlock(LOCK_group_commit)
> for thd2 in <queue>
> lock(thd2->LOCK_wakeup)
> thd2->ready= true
> signal(thd2->COND_wakeup)
> unlock(thd2->LOCK_wakeup)
> unlock(LOCK_commit_ordered)
> else
> lock (thd->LOCK_wakeup)
> while (!thd->ready)
> wait(COND_wakeup, LOCK_wakeup)
> unlock (thd->LOCK_wakeup)
>
> cookie= xid_log_after()
Where in this algorithm you call ht->commit_ordered() ?
> On the other hand, the algorithm I suggested earlier for START
> TRANSACTION WITH CONSISTENT SNAPSHOT used the LOCK_commit_ordered, and
> there might be other uses...
>
> So I am not sure. I'd like to think more about it, or what do you
> think?
START TRANSACTION WITH CONSISTENT SNAPSHOT is a good reason to keep the
mutex.
> >> It would be possible to iterate over the queue to invoke
> >> prepare_ordered() in sequence from a single thread, just like
> >> group_log_xid() and commit_ordered(). But this would delay the
> >> calls until the previous group commit is done and the next one
> >> starts
> >
> > No, why ? You only need one atomic fetch_and_store to copy the queue
> > head to a local variable and reset the queue. Then you can call
> > prepare_ordered or commit_ordered in the queue order without any
> > mutex.
>
> I am not sure if I understood your suggestion correctly. But what I
> considered with the above statement about "delaying calls to
> prepare_ordered()" is this:
>
> Just like the group leader thread runs commit_ordered() in a loop over
> the queue just after group_log_xid(), we could have it do a similar
> loop for prepare_ordered() just before group_log_xid().
Yes.
> But I choose to do it earlier, as soon as the transaction is put in
> the queue and commit order thereby defined.
>
> There can be quite a "long" time interval between these two events:
> the time it takes for the previous group_log_xid() (eg. an fsync()),
> plus sometimes one wants to add extra sleeps in group commit to group
> more transactions together.
No.
The long interval is *inside* the group_log_xid(), while you call
prepare_ordered() *before* it.
But anyway, the LOCK_prepare_ordered mutex is not going to be contented,
so removing it by using a lock-free queue (that's what this second
approach is about) will not bring any noticeable benefits.
> The main performance bottleneck I am introducing is, I think, the
> serialisation of the commit_ordered() part of commit. Not just for
> some particular engine implementation, but for the interface. That is
> not a decision to be taken lightly.
>
> Of course, compared to InnoDB today, it's much better, as it gets rid
> of the InnoDB prepare_commit_mutex (which spans all the way from end
> of prepare() to end of what is effectively commit_ordered()), and also
> avoids taking LOCK_log over all of log_xid() in the binlog.
>
> But for something like NDB, I think serialised commit order would
> really hurt (if it even makes sense ...)
>
> Maybe the answer here is that engines can choose to support
> commit_ordered() or not (and NDB-like engines probably will not). And
> if not, there is just no support for consistent commit order.
>
> And if we implement the simple way to recover engines from binlog
> without fsync() in prepare() and commit(), then it will only work for
> engines supporting commit_ordered(). Later we could implement the more
> complex recovery without need for commit_ordered() support.
It's reasonable to say that if an engine does not implement
commit_ordered() then it needs to take care of its own recovery and
fsync both in prepare and commit.
Regards,
Sergei
Follow ups
References