maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #03651
Re: Implementing new "group commit" API in PBXT?
Paul McCullagh <paul.mccullagh@xxxxxxxxxxxxx> writes:
>> In particular this, flushing the data log (is this flush to disk?):
>> if (!thread->st_dlog_buf.dlb_flush_log(TRUE, thread)) {
>> ok = FALSE;
>> status = XT_LOG_ENT_ABORT;
>> }
>
> Yes, this is a flush to disk.
>
> This could be done in the slow part (obviously this would be ideal).
It occured to me that since we only do this (the new commit_ordered() API
call) after having successfully run prepare(), the data log will already have
been flushed to disk, right?
So I suppose in this case, the data log flush will be a no-operation. In which
case it is no problem to leave it in the "fast" part, or we could skip calling
it.
> But there is the following problem that should then be fixed.
>
> If we write the transaction log (i.e. commit the transaction), even if
> we do not flush the transaction log. It may be flushed by some other
> thread later. This will make the commit durable (in other words, on
> recovery, this transaction will be rolled forward).
>
> If we do not flush the data log, then there is a chance that such a
> commit transaction is incomplete, because the associated data log data
> has not been committed.
>
> The way to fix this problem is to check the extend of flushing of both
> the data and the transaction log on recovery. Simply put, on recover
> we check if the data log part of each record is completely flushed (is
> within the flush zone of the data log).
>
> If a data log record is missing, then recovery stops at that point in
> the transaction log.
Yes, I see, thanks for the explanation.
> This will have to be built into the engine. And, it is easiest to do
> this in PBXT 1.5 which handle transaction logs and data logs
> identically.
Ok. Well, maybe it's not necessary as per above observation.
>> and this, at the end concerning the "sweeper":
>>
>> if (db->db_sw_faster)
>> xt_wakeup_sweeper(db);
>
> Yes, this could be taken out of the fast part, although it is not
> called all that often.
Ok, I will omit it.
>> Also, this statement definitely needs to be postponed to the "slow"
>> part I
>> guess:
>>
>> thread->st_xact_data = NULL;
>
> Actually, I don't think so. As far as PBXT is concerned, after the
> fast part has run, the transaction is committed. It is just not
> durable.
>
> This means that anything we do in the slow part should not need an
> explicit reference to the transaction.
Right, I see what you mean, I will keep it.
> The flush log position is always increasing. Critical is when we
> switch logs, e.g. from log_id=100, log_offset=80000, to log_id=101,
> log_offset=0.
>
> I believe when this is done, the log_offset is first set to zero, then
> the log_id is incremented (should check this).
>
> This would mean that the comparing function would err on the side of
> flushing unnecessarily if the check comes between the to operations.
Yes, that should work.
However, you need a write memory barrier when you update the position, and a
read memory barrier when you read it:
xl_flush_log_offset = 0;
wmb();
xl_flush_log_id = old_id + 1;
...
local_id = xl_flush_log_id;
rmb();
local_offset = xl_flush_log_offset;
Without this, the CPU may do the reads or writes in the opposite order (or
more likely, the newest optimisations in GCC will do it).
> I would actually recommend a "lazy" approach to the implementation.
>
> Simply add a boolean to the current commit, which indicates a fast
> commit should be done.
>
> Then we add a new "slow commit" function which does the parts not done
> by the slow commit.
Ok, thanks a lot for the advice, I will give it another shot.
- Kristian.
Follow ups
References