It was brought to my attention an issue with parallel replication and the
Seconds_Behind_Master field of SHOW SLAVE STATUS. I have a possible patch
for this, but I wanted to discuss it on the list, as it changes semantics
compared to the non-parallel case.
Each binlog event contains a timestamp (**) of when the event was created on
the master. Whenever the slave SQL thread reads an event from the relay log,
it updates the value of Seconds_Behind_Master to the difference between the
slave's current time and the event's timestamp.
Now in parallel replication, the SQL thread can read a large number of
events from the relay log and queue them in-memory for the worker threads.
So a small value of Seconds_Behind_Master means only that recent events have
been queued - it might still be a long time before the worker threads have
had time to actually execute all the queued events. Apparently the problem
is (justified) user confusion about this queuing delay not being reflected
in Seconds_Behind_Master.
The same problem actually exists in the non-parallel case. In case of a
large transaction, the Seconds_Behind_Master can be small even though there
is still a large amount of execution time remaining for the transaction to
complete on the slave. However, in the non-parallel case, at most one
transaction can be involved. In the parallel case, the problem is amplified
by the potential of thousands of queued transactions awaiting execution.
So how to solve it? Attached is a patch that implements one possible
solution: the Seconds_Behind_Master is only updated after a transaction
commits, with the timestamp of the commit events. This seems more intuitive
anyway. But it does introduce a semantic difference between the non-parallel
and parallel behaviour for Seconds_Behind_Master. The value will in general
be larger on a parallel slave than on a non-parallel slave, for the same
actual slave lag.
Monty suggested changing the behaviour also for non-parallel mode - letting
Seconds_Behind_Master reflect only events actually committed, not just read
from the relay log. This would introduce an incompatible behaviour for
Seconds_Behind_Master, but could perhaps be done for 10.1, if desired. Doing
it in stable 10.0 would be more drastic.
So any opinions on this?
- Should Seconds_Behind_Master be changed as per above in parallel
replication (from 10.0 on)?
- If not, any suggestion for another semantics for Seconds_Behind_Master in
parallel replication?
- If so, should the change to Seconds_Behind_Master also be done in the
non-parallel case in 10.1? What about 10.0?
- Any comments on the patch?