Re: MDEV-9423: FTWRL and Binlog checkpoint


Hi Serg,

On Tue, Jun 28, 2016 at 5:02 PM, Sergei Golubchik <serg@xxxxxxxxxxx> wrote:

> Hi, Nirbhay!
> On Jun 27, Nirbhay Choubey wrote:
> > >
> > > That seems quite ugly, why not call it from the SST code, after it
> > > has called reload_acl_and_cache()? You're basically making FLUSH
> > > LOGS behave differently in Galera and non-Galera (if my
> > > understanding is correct), which might lead to subtle bugs?
> >
> > I initially thought of adding the call after reload_acl_and_cache(),
> > but there could still be a case when user performs a
> > REFRESH_BINARY_LOG before LOCK_log is acquired.
> Right, but you didn't fix it. You have
>   1> FTWRL
>   2> reload_acl_and_cache()
>     3> wait_for_last_checkpoint_event()
>   4> SET global innodb_disallow_writes=1
>   5> mysql_mutex_lock(LOCK_log)
> You've described your case correctly: "when user performs
> REFRESH_BINARY_LOG before LOCK_log is acquired". That is, you care when
> a user performs REFRESH_BINARY_LOG between 3 and 5. You don't care if
> somebody does REFRESH_BINARY_LOG between 2 and 3. So, you can as well
> move wait_for_last_checkpoint_event() out of reload_acl_and_cache().

If wait is moved outside wait_for_last_checkpoint_event() (say 3') and
REFRESH_BINARY_LOG kicks in right after the wait (3') but before (5), will
trigger creation of another new binlog file for which the last checkpoint
(logged asynchronously by a separate thread) may not make it into time and
will cause the same issue on joiner node.

Another workable option was to move wait outside and after
and not release LOCK_log until the file transfer is complete.

2> reload_acl_and_cache()
3> wait for last checkpoint event & lock(LOCK_log)
4> SET global innodb_disallow_writes=1
... file transfer ...
5> mysql_mutex_unlock(LOCK_log)

But with LOCK_log locked in #3,
will fail for #4.

> With wait_for_last_checkpoint_event inside reload_acl_and_cache or
> outside, you still don't have anything that would prevent user from
> doing REFRESH_BINARY_LOG between 3 and 5.

It wouldn't prevent the user from doing REFRESH_BINARY_LOG, but with
wait_for_last_checkpoint_event() added to reload_acl_and_cache(), it would
ensure every REFRESH_BINARY_LOG (either from user or #2 above) waits
until last checkpoint event makes into the new binary log file.


> Regards,
> Sergei
> Chief Architect MariaDB
> and security@xxxxxxxxxxx

