← Back to team overview

maria-developers team mailing list archive

Re: MDEV-9423: FTWRL and Binlog checkpoint


Hi Kristian!

On Mon, May 2, 2016 at 2:10 PM, Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>

> Nirbhay Choubey <nirbhay@xxxxxxxxxxx> writes:
> [Cc: maria-developers@, please always keep these discussions on the
> mailing list]
> > In Galera cluster, the state transfer scripts perform FTWRL and
> > copy data along with the last of all available binlog files to the
> > joiner node.
> >
> > After MDEV-181, I understand that the binlog checkpoint can be
> > in any of the binary log files (and not necessarily the last one).
> >
> > This seemingly has caused MDEV-9423, in which the joiner node
> > complains of the missing binlog file.
> >
> > Now the question is : Is FTWRL not sufficient to ensure that the
> > checkpoint is always the last binlog file?
> So if I understand correctly, the issue is related to having binlog files
> available during XA crash recovery. When the binlog file is rotated, there
> is a small window where both the latest and the previous binlog files are
> needed for crash recovery. The binlog checkpoint is the earliest binlog
> file
> that is needed for crash recovery, and it can be seen from the binlog
> checkpoint event.
> So the problem here is that a copy is made just after binlog rotation, and
> Galera only copies the most recent, mostly-empty binlog file, leaving
> insufficient information for XA recovery, right?


> One option to solve this is to always copy the last two binlog files. While
> it is theoretically possible to have the binlog checkpoint more than two
> files back, I think it will not occur in practice.

> Another option is to wait for the binlog checkpoint to reach the current
> binlog file. You can see this done in the test suite:
>   mysql-test/include/wait_for_binlog_checkpoint.inc
> The binlog checkpointing happens asynchroneously, I *think* it can complete
> even while FTWRL is active, but I am not 100% sure though.
> The checkpoint happens after InnoDB has made its commits durable with
> fsync() or similar - only after that is it safe to discard the old binlog
> data and still have correct crash recovery.

While copying the last 2 binlog files would have solved this, I have worked
a solution where the donor node waits for binlog checkpoint event for last
file to get logged before proceeding with file transfer.


By the way, I initially tried reusing
is_xidlist_idle_nolock()/COND_xid_list to implement the
waiting mechanism. But since binlog checkpoint events are written
asynchronously after
xid_count falls to 0, that did not work. So later came up with the above


>  - Kristian.

Follow ups