← Back to team overview

maria-developers team mailing list archive

Re: MDEV-9423: FTWRL and Binlog checkpoint

 

Hi Serg,

On Tue, Jun 28, 2016 at 5:02 PM, Sergei Golubchik <serg@xxxxxxxxxxx> wrote:

> Hi, Nirbhay!
>
> On Jun 27, Nirbhay Choubey wrote:
> > >
> > > That seems quite ugly, why not call it from the SST code, after it
> > > has called reload_acl_and_cache()? You're basically making FLUSH
> > > LOGS behave differently in Galera and non-Galera (if my
> > > understanding is correct), which might lead to subtle bugs?
> >
> > I initially thought of adding the call after reload_acl_and_cache(),
> > but there could still be a case when user performs a
> > REFRESH_BINARY_LOG before LOCK_log is acquired.
>
> Right, but you didn't fix it. You have
>
>   1> FTWRL
>   2> reload_acl_and_cache()
>     3> wait_for_last_checkpoint_event()
>   4> SET global innodb_disallow_writes=1
>   5> mysql_mutex_lock(LOCK_log)
>
> You've described your case correctly: "when user performs
> REFRESH_BINARY_LOG before LOCK_log is acquired". That is, you care when
> a user performs REFRESH_BINARY_LOG between 3 and 5. You don't care if
> somebody does REFRESH_BINARY_LOG between 2 and 3. So, you can as well
> move wait_for_last_checkpoint_event() out of reload_acl_and_cache().
>


If wait is moved outside wait_for_last_checkpoint_event() (say 3') and
user's
REFRESH_BINARY_LOG kicks in right after the wait (3') but before (5), will
trigger creation of another new binlog file for which the last checkpoint
event
(logged asynchronously by a separate thread) may not make it into time and
will cause the same issue on joiner node.

Another workable option was to move wait outside and after
reload_acl_and_cache
and not release LOCK_log until the file transfer is complete.

1> FTWRL
2> reload_acl_and_cache()
3> wait for last checkpoint event & lock(LOCK_log)
4> SET global innodb_disallow_writes=1
... file transfer ...
5> mysql_mutex_unlock(LOCK_log)

But with LOCK_log locked in #3,
mysql_mutex_assert_not_owner(mysql_bin_log.get_log_lock())
will fail for #4.



> With wait_for_last_checkpoint_event inside reload_acl_and_cache or
> outside, you still don't have anything that would prevent user from
> doing REFRESH_BINARY_LOG between 3 and 5.
>

It wouldn't prevent the user from doing REFRESH_BINARY_LOG, but with
wait_for_last_checkpoint_event() added to reload_acl_and_cache(), it would
ensure every REFRESH_BINARY_LOG (either from user or #2 above) waits
until last checkpoint event makes into the new binary log file.

Best,
Nirbhay


> Regards,
> Sergei
> Chief Architect MariaDB
> and security@xxxxxxxxxxx
>

Follow ups

References