← Back to team overview

maria-developers team mailing list archive

Re: Slave can take a very long time to start replication

 

Pavel Ivanov <pivanof@xxxxxxxxxx> writes:

> start replicating it passes GTID to start from, master finds binlog
> file where the earliest GTID is located and then scans through that
> file to find the exact binlog position to start sending binlog events
> from. If this binlog file is pretty big then scanning can take a very
> long time. I guess especially long when several slaves try to start
> replicating roughly at the same time. We observed 60-90 seconds

Ouch, that's a big delay :-(

> Did you think about this problem before? Maybe you've even planned
> already to implement some solution for this?

Yes, two possible solutions.

My prefered solution is to change the binlog to be page-based, just like other
database transaction logs. This has several benefits - for example easy
pre-allocation which reduces the fsync() penalty by 1/2 or more, and
protection from partial disk writes corrupting the end of the binlog. And it
would allow binary search in the log to find the starting GTID, which should
greatly improve slave connect time.

But re-implementing binlog format is probably too big a task to do anytime
soon, unfortunately. So the easier plan is to implement a binlog index, a
separate file master-idx.XXXXXX alongside each master-bin.XXXXXX. Periodically
(like every 100 events or whatever), the current binlog GTID state would be
written out to this file along with the corresponding binlog offset, in some
page-based format. When a slave connects, binary search is done on the index
file to quickly find where to start in the binlog file. Writing the binlog
index should have low overhead, as there is no need to fsync() or even flush
it regularly. If we crash, we can just re-build the index file as part of the
binlog scan that anyway takes place during crash recovery (or just fall back
to binlog scan if no index file is found).

There has not been time to get any of these solutions implemented at this
point, so for now the workaround is to use smaller size binlog files, I
suppose...

 - Kristian.


References