maria-developers team mailing list archive

Thread
Date

Re: Implementing new "group commit" API in PBXT?

To: Paul McCullagh <paul.mccullagh@xxxxxxxxxxxxx>
From: Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
Date: Sun, 10 Oct 2010 11:07:57 +0200
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx
In-reply-to: <73342C0B-F4B5-43F4-A7B4-DE62762E7AEB@primebase.org> (Paul McCullagh's message of "Wed\, 6 Oct 2010 08\:43\:04 +0200")
User-agent: Gnus/5.11 (Gnus v5.11) Emacs/22.1 (gnu/linux)

Paul McCullagh <paul.mccullagh@xxxxxxxxxxxxx> writes:

>> In particular this, flushing the data log (is this flush to disk?):

>>    if (!thread->st_dlog_buf.dlb_flush_log(TRUE, thread)) {
>>            ok = FALSE;
>>            status = XT_LOG_ENT_ABORT;
>>    }

>
> Yes, this is a flush to disk.
>
> This could be done in the slow part (obviously this would be ideal).

It occured to me that since we only do this (the new commit_ordered() API
call) after having successfully run prepare(), the data log will already have
been flushed to disk, right?

So I suppose in this case, the data log flush will be a no-operation. In which
case it is no problem to leave it in the "fast" part, or we could skip calling
it.

> But there is the following problem that should then be fixed.
>
> If we write the transaction log (i.e. commit the transaction), even if
> we do not flush the transaction log. It may be flushed by some other
> thread later. This will make the commit durable (in other words, on
> recovery, this transaction will be rolled forward).
>
> If we do not flush the data log, then there is a chance that such a
> commit transaction is incomplete, because the associated data log data
> has not been committed.
>
> The way to fix this problem is to check the extend of flushing of both
> the data and the transaction log on recovery. Simply put, on recover
> we check if the data log part of each record is completely flushed (is
> within the flush zone of the data log).
>
> If a data log record is missing, then recovery stops at that point in
> the transaction log.

Yes, I see, thanks for the explanation.

> This will have to be built into the engine. And, it is easiest to do
> this in PBXT 1.5 which handle transaction logs and data logs
> identically.

Ok. Well, maybe it's not necessary as per above observation.

>> and this, at the end concerning the "sweeper":
>>
>>    if (db->db_sw_faster)
>>            xt_wakeup_sweeper(db);
>
> Yes, this could be taken out of the fast part, although it is not
> called all that often.

Ok, I will omit it.

>> Also, this statement definitely needs to be postponed to the "slow"
>> part I
>> guess:
>>
>>    thread->st_xact_data = NULL;
>
> Actually, I don't think so. As far as PBXT is concerned, after the
> fast part has run, the transaction is committed. It is just not
> durable.
>
> This means that anything we do in the slow part should not need an
> explicit reference to the transaction.

Right, I see what you mean, I will keep it.

> The flush log position is always increasing. Critical is when we
> switch logs, e.g. from log_id=100, log_offset=80000, to log_id=101,
> log_offset=0.
>
> I believe when this is done, the log_offset is first set to zero, then
> the log_id is incremented (should check this).
>
> This would mean that the comparing function would err on the side of
> flushing unnecessarily if the check comes between the to operations.

Yes, that should work.

However, you need a write memory barrier when you update the position, and a
read memory barrier when you read it:

   xl_flush_log_offset = 0;
   wmb();
   xl_flush_log_id = old_id + 1;

...

   local_id = xl_flush_log_id;
   rmb();
   local_offset = xl_flush_log_offset;

Without this, the CPU may do the reads or writes in the opposite order (or
more likely, the newest optimisations in GCC will do it).

> I would actually recommend a "lazy" approach to the implementation.
>
> Simply add a boolean to the current commit, which indicates a fast
> commit should be done.
>
> Then we add a new "slow commit" function which does the parts not done
> by the slow commit.

Ok, thanks a lot for the advice, I will give it another shot.

 - Kristian.

Follow ups

Re: Implementing new "group commit" API in PBXT?
From: Kristian Nielsen, 2010-10-15

References

Implementing new "group commit" API in PBXT?
From: Kristian Nielsen, 2010-09-29
Re: Implementing new "group commit" API in PBXT?
From: Paul McCullagh, 2010-10-04
Re: Implementing new "group commit" API in PBXT?
From: Kristian Nielsen, 2010-10-05
Re: Implementing new "group commit" API in PBXT?
From: Paul McCullagh, 2010-10-06