maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #09752
Re: MDEV-9423: FTWRL and Binlog checkpoint
Hi Kristian!
On Mon, May 2, 2016 at 2:10 PM, Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
wrote:
> Nirbhay Choubey <nirbhay@xxxxxxxxxxx> writes:
>
> [Cc: maria-developers@, please always keep these discussions on the
> mailing list]
>
> > In Galera cluster, the state transfer scripts perform FTWRL and
> > copy data along with the last of all available binlog files to the
> > joiner node.
> >
> > After MDEV-181, I understand that the binlog checkpoint can be
> > in any of the binary log files (and not necessarily the last one).
> >
> > This seemingly has caused MDEV-9423, in which the joiner node
> > complains of the missing binlog file.
> >
> > Now the question is : Is FTWRL not sufficient to ensure that the
> > checkpoint is always the last binlog file?
>
> So if I understand correctly, the issue is related to having binlog files
> available during XA crash recovery. When the binlog file is rotated, there
> is a small window where both the latest and the previous binlog files are
> needed for crash recovery. The binlog checkpoint is the earliest binlog
> file
> that is needed for crash recovery, and it can be seen from the binlog
> checkpoint event.
>
> So the problem here is that a copy is made just after binlog rotation, and
> Galera only copies the most recent, mostly-empty binlog file, leaving
> insufficient information for XA recovery, right?
>
Correct.
>
> One option to solve this is to always copy the last two binlog files. While
> it is theoretically possible to have the binlog checkpoint more than two
> files back, I think it will not occur in practice.
> Another option is to wait for the binlog checkpoint to reach the current
> binlog file. You can see this done in the test suite:
>
> mysql-test/include/wait_for_binlog_checkpoint.inc
>
> The binlog checkpointing happens asynchroneously, I *think* it can complete
> even while FTWRL is active, but I am not 100% sure though.
>
> The checkpoint happens after InnoDB has made its commits durable with
> fsync() or similar - only after that is it safe to discard the old binlog
> data and still have correct crash recovery.
>
While copying the last 2 binlog files would have solved this, I have worked
out
a solution where the donor node waits for binlog checkpoint event for last
binlog
file to get logged before proceeding with file transfer.
http://lists.askmonty.org/pipermail/commits/2016-June/009483.html
By the way, I initially tried reusing
is_xidlist_idle_nolock()/COND_xid_list to implement the
waiting mechanism. But since binlog checkpoint events are written
asynchronously after
xid_count falls to 0, that did not work. So later came up with the above
patch.
Best,
Nirbhay
>
> - Kristian.
>
Follow ups
References