← Back to team overview

maria-developers team mailing list archive

Re: On the issue of Seconds_behind_master and Parallel Replication

 

I've pushed to 10.0 and 10.1 the change I described: With parallel
replication, seconds_behind_master is updated only after transactions
commit. (I did not change the behaviour of the non-parallel replication
case yet.)

Ian, do you have enough information from this thread that you could update
the docs in the knowledgebase accordingly?

 - Kristian.

Ian Gilfillan <ian@xxxxxxxxxxx> writes:

> From a user's perspective, I like the idea of introducing the change
> for both parallel and non-parallel in 10.1.
>
> On 15/10/2015 08:16, Kristian Nielsen wrote:
>> It was brought to my attention an issue with parallel replication and the
>> Seconds_Behind_Master field of SHOW SLAVE STATUS. I have a possible patch
>> for this, but I wanted to discuss it on the list, as it changes semantics
>> compared to the non-parallel case.
>>
>> Each binlog event contains a timestamp (**) of when the event was created on
>> the master. Whenever the slave SQL thread reads an event from the relay log,
>> it updates the value of Seconds_Behind_Master to the difference between the
>> slave's current time and the event's timestamp.
>>
>> Now in parallel replication, the SQL thread can read a large number of
>> events from the relay log and queue them in-memory for the worker threads.
>> So a small value of Seconds_Behind_Master means only that recent events have
>> been queued - it might still be a long time before the worker threads have
>> had time to actually execute all the queued events. Apparently the problem
>> is (justified) user confusion about this queuing delay not being reflected
>> in Seconds_Behind_Master.
>>
>> The same problem actually exists in the non-parallel case. In case of a
>> large transaction, the Seconds_Behind_Master can be small even though there
>> is still a large amount of execution time remaining for the transaction to
>> complete on the slave. However, in the non-parallel case, at most one
>> transaction can be involved. In the parallel case, the problem is amplified
>> by the potential of thousands of queued transactions awaiting execution.
>>
>> So how to solve it? Attached is a patch that implements one possible
>> solution: the Seconds_Behind_Master is only updated after a transaction
>> commits, with the timestamp of the commit events. This seems more intuitive
>> anyway. But it does introduce a semantic difference between the non-parallel
>> and parallel behaviour for Seconds_Behind_Master. The value will in general
>> be larger on a parallel slave than on a non-parallel slave, for the same
>> actual slave lag.
>>
>> Monty suggested changing the behaviour also for non-parallel mode - letting
>> Seconds_Behind_Master reflect only events actually committed, not just read
>> from the relay log. This would introduce an incompatible behaviour for
>> Seconds_Behind_Master, but could perhaps be done for 10.1, if desired. Doing
>> it in stable 10.0 would be more drastic.
>>
>> So any opinions on this?
>>
>>   - Should Seconds_Behind_Master be changed as per above in parallel
>>     replication (from 10.0 on)?
>>
>>   - If not, any suggestion for another semantics for Seconds_Behind_Master in
>>     parallel replication?
>>
>>   - If so, should the change to Seconds_Behind_Master also be done in the
>>     non-parallel case in 10.1? What about 10.0?
>>
>>   - Any comments on the patch?
>>
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~maria-developers
> Post to     : maria-developers@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~maria-developers
> More help   : https://help.launchpad.net/ListHelp


Follow ups

References