← Back to team overview

maria-developers team mailing list archive

Re: Architecture review of MWL#116 "Efficient group commit for binary log"

 

Sergei Golubchik <serg@xxxxxxxxxxxx> writes:

>> So now the algorithm is something like this:

> Where in this algorithm you call ht->commit_ordered() ?

Oops, I forgot, sorry!

It should be at the start of "for thd2 in <queue>" (before the wakeup).

>>     thd->ready= false
>>     lock(LOCK_prepare_ordered)
>>     old_queue= group_commit_queue
>>     thd->next= old_queue
>>     group_commit_queue= thd
>>     ht->prepare_ordered()
>>     unlock(LOCK_prepare_ordered)
>> 
>>     if (old_queue == NULL) // leader?
>>         lock(LOCK_group_commit)
>> 
>>         lock(LOCK_prepare_ordered)
>>         queue= reverse(group_commit_queue)
>>         group_commit_queue= NULL
>>         unlock(LOCK_prepare_ordered)
>> 
>>         group_log_xid(queue)
>> 
>>         lock(LOCK_commit_ordered)  // but see below
>>         unlock(LOCK_group_commit)
>>         for thd2 in <queue>

Here:          ht->commit_ordered(thd2)

>>             lock(thd2->LOCK_wakeup)
>>             thd2->ready= true
>>             signal(thd2->COND_wakeup)
>>             unlock(thd2->LOCK_wakeup)
>>         unlock(LOCK_commit_ordered)
>>     else
>>         lock (thd->LOCK_wakeup)
>>         while (!thd->ready)
>>             wait(COND_wakeup, LOCK_wakeup)
>>         unlock (thd->LOCK_wakeup)
>> 
>>     cookie= xid_log_after()
>

>> On the other hand, the algorithm I suggested earlier for START
>> TRANSACTION WITH CONSISTENT SNAPSHOT used the LOCK_commit_ordered, and
>> there might be other uses...

> START TRANSACTION WITH CONSISTENT SNAPSHOT is a good reason to keep the
> mutex.

Yes, probably.

>> But I choose to do it earlier, as soon as the transaction is put in
>> the queue and commit order thereby defined.
>> 
>> There can be quite a "long" time interval between these two events:
>> the time it takes for the previous group_log_xid() (eg. an fsync()),
>> plus sometimes one wants to add extra sleeps in group commit to group
>> more transactions together.
>
> No.
> The long interval is *inside* the group_log_xid(), while you call
> prepare_ordered() *before* it.

Right, that is what I meant. One group of transactions execute the long
interval inside group_log_xid(). While this happens, new transactions that
want to commit queue up waiting for the first group to finish. The first
waiting transaction (the new leader) blocks on the LOCK_group_commit, any
other waits for the new leader to wake them up. So I want to call
prepare_ordered() before blocking on LOCK_group_commit, as that mutex is held
for the duration of group_log_xid()

> But anyway, the LOCK_prepare_ordered mutex is not going to be contented,
> so removing it by using a lock-free queue (that's what this second
> approach is about) will not bring any noticeable benefits.

Very true.

> It's reasonable to say that if an engine does not implement
> commit_ordered() then it needs to take care of its own recovery and
> fsync both in prepare and commit.

Yes, sounds reasonable.

Thanks!

 - Kristian.



References