← Back to team overview

maria-developers team mailing list archive

Re: MDEV-9423: FTWRL and Binlog checkpoint

 

Hi Kristian,

On Thu, Jun 23, 2016 at 3:39 AM, Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
wrote:

> Nirbhay Choubey <nirbhay@xxxxxxxxxxx> writes:
>
> > While copying the last 2 binlog files would have solved this, I have
> worked
> > out
> > a solution where the donor node waits for binlog checkpoint event for
> last
> > binlog
> > file to get logged before proceeding with file transfer.
> >
> > http://lists.askmonty.org/pipermail/commits/2016-June/009483.html
>
> Urgh, please don't do this, seems there are multiple problems with this
> patch (insufficient locking, introducing a new redundant wait mechanism,
> comparing binlog file names rather than ids, ...).
>
> > By the way, I initially tried reusing
> > is_xidlist_idle_nolock()/COND_xid_list to implement the
> > waiting mechanism. But since binlog checkpoint events are written
> > asynchronously after
> > xid_count falls to 0, that did not work. So later came up with the above
>
> I think it should work if you follow the chained locking of LOCK_xid_list
> and LOCK_log. First wait under LOCK_xid_list for the binlog_xid_count_list
> to become empty. Then release LOCK_xid_list and take and immediately
> release
> LOCK_log. mark_xid_done() will hold onto LOCK_log until the checkpoint
> event
> has been written.
>
> Note that there is already a similar wait mechanism, used by RESET
> MASTER. RESET MASTER also needs to wait for checkpoint events to be
> completed before running, so we should reuse that mechanism.
>

Right.


>
> Also, it seems reasonable that FTWRL in general could wait for checkpoint
> events so that other backup mechanisms similarly could avoid binlog files
> changing during backup. So please fix this in FTWRL, in 10.2. (If you feel
> you need to fix the galera bug in 10.1, you can implement it only for
> galera
> in 10.1).
>

That sound good to me. But, considering Percona's backup locks, it seems
more logical to
implement this in Backup locks instead, whenever they get
ported/implemented in MariaDB.

Also, in this particular case, the problem lies
in reload_acl_and_cache(REFRESH_BINARY_LOG),
(executed after FTWRL while preparing for SST) that rotates the binary log.
So, FTWRL is not
directly linked to this issue. And as you rightly pointed, I will refrain
from altering FTWRL's
behavior in 10.1 at least.


> So in more detail, here is suggested way to fix:
>
> In FTWRL (somewhere near the end, after commits are blocked), wait for
> checkpoint events to be written using a similar mechanism as RESET MASTER:
>
>   if (mysql_bin_log.is_open())
>   {
>     mysql_mutex_lock(&LOCK_xid_list);
>     for (;;)
>     {
>       if (binlog_xid_count_list.is_last(binlog_xid_count_list.head()))
>         break;
>       mysql_cond_wait(&COND_xid_list, &LOCK_xid_list);
>     }
>     mysql_mutex_unlock(&LOCK_xid_list);
>     /*
>       LOCK_xid_list and LOCK_log are chained, so the LOCK_log will only be
>       obtained after mark_xid_done() has written the last checkpoint event.
>     */
>     mysql_mutex_lock(&LOCK_log);
>     mysql_mutex_unlock(&LOCK_log);
>   }
>
> Now, since FTWRL is a bit different from RESET MASTER, we need a couple
> other changes:
>
>  - Use mysql_cond_broadcast(&COND_xid_list) instead of mysql_cond_signal()
>    in mark_xid_done() (to allow multiple waiters).
>
>  - The second (but not the first mysql_cond_broadcast() in mark_xid_done()
>    should be unconditional, so remove the if() here:
>
>       if (unlikely(reset_master_pending))
>         mysql_cond_signal(&COND_xid_list);
>
>  - Also add mysql_cond_broadcast(&COND_xid_list) in two other places that
>    the binlog_xid_count_list is modified. One in MYSQL_BIN_LOG::open():
>
>       while ((b= binlog_xid_count_list.head()) && b->xid_count == 0)
>         my_free(binlog_xid_count_list.get());
>
>    And one in reset_logs():
>
>       my_free(binlog_xid_count_list.get());
>
> This should make FTWRL wait for all pending binlog checkpoint events to be
> written. And with commits blocked, no new checkpoints should become
> pending.
>
> Does it seem reasonable to you? Let me know if some things are unclear or
> if
> you see any potential problems with it.
>

Yes, it worked. But, to solve this issue in 10.1, I have added this wait to
REFRESH_BINARY_LOG
(as explained above) only when the server is acting as a Galera node.


> By the way, how to you intend to handle the case where RESET MASTER is run
> during SST? I just checked, FTWRL does not seem to block RESET MASTER. Or
> do
> you have another mechanism to prevent RESET MASTER from running during SST?
> Thinking more, you should be holding LOCK_log while copying the binlog
> files (I'm guessing your not currently, right?)


You are right.


> This will block RESET
> MASTER,


I am now taking LOG_log during the duration of file transfer as protection
against the above commands.


> and it also makes the extra lock/unlock of LOCK_log above redundant.


Not quite. The wait logic (that includes LOCK_log, as the snippet above) is
to pause
REFRESH_BINARY_LOG and an additional use of LOCK_log to block the RESET/
FLUSH commands while file transfer is in progress.




> Also, FTWRL has really complex semantics. You should get Monty's opinion
> (or
> maybe Serg?) on whether there are any potentials for deadlocks to waiting
> inside FTWRL for binlog checkpoints.
>

As explained above, FTWRL remains unchanged, but will still check if
Monty/Serg
can take a look at the fix.

http://lists.askmonty.org/pipermail/commits/2016-June/009494.html

Best,
Nirbhay


>  - Kristian.
>

Follow ups

References