← Back to team overview

maria-discuss team mailing list archive

Re: Semi-sync replication hangs when changing binlog filename.

 

Hi Pavel, Kristian,


This fixes the problem for me. How do we go about getting this into a release?


Joseph.


________________________________
From: Pavel Ivanov <pivanof@xxxxxxxxxx>
Sent: Tuesday, 16 August 2016 2:28 AM
To: Kristian Nielsen; Jonas Oreland
Cc: Joseph Glanville; maria-discuss@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Maria-discuss] Semi-sync replication hangs when changing binlog filename.

Yes, Kristian. Thanks! This indeed looks like the correct patch. In
fact I see that internally we have this changed as part of the same
patch where after_flush was changed (i.e.
https://github.com/MariaDB/server/commit/4d8b346e079a27960dbe49e4d0ec4364bed8d30e).
[https://avatars1.githubusercontent.com/u/12231504?v=3&s=200]<https://github.com/MariaDB/server/commit/4d8b346e079a27960dbe49e4d0ec4364bed8d30e>

MDEV-7257: Dump Thread Enhancements · MariaDB/server@4d8b346<https://github.com/MariaDB/server/commit/4d8b346e079a27960dbe49e4d0ec4364bed8d30e>
github.com
Make the binlog dump threads not need to take LOCK_log while sending binlog events to slave. Instead, a new LOCK_binlog_end_pos is used just to coordinate tracking the current end-of-log. This is ...



I'm not sure why Jonas didn't include it in
https://jira.mariadb.org/browse/MDEV-7257.

On Mon, Aug 15, 2016 at 5:05 AM, Kristian Nielsen
<knielsen@xxxxxxxxxxxxxxx> wrote:
> Pavel Ivanov <pivanof@xxxxxxxxxx> writes:
>
>> binlog ending at position mariadb-bin.000004:2039896, somehow the
>> function ReplSemiSyncMaster::commitTrx() gets trx_wait_binlog_name =
>> 'mariadb-bin.000005' and trx_wait_binlog_pos = 2039896. I.e. the
>> function gets the position of the transaction to wait semi-sync ack
>> for correctly, but the file name is already the one that is current
>> after rotation. Master starts waiting for that position, but the slave
>
>> Kristian, do you have any idea what's going on? Is there an
>> inappropriate lock release/re-acquire somewhere?
>
> Hm. Actually, looking into MYSQL_BIN_LOG::trx_group_commit_leader, this
> looks suspicious:
>
>     RUN_HOOK(binlog_storage, after_flush,
>         (current->thd,
>          current->cache_mngr->last_commit_pos_file,
>          current->cache_mngr->last_commit_pos_offset, synced,
>          first, last))
>
> But
>
>     RUN_HOOK(binlog_storage, after_sync,
>              (current->thd, log_file_name,
>               current->cache_mngr->last_commit_pos_offset,
>               first, last))
>
> I would have expected that `log_file_name' to be also
> current->cache_mngr->last_commit_pos_file, like in the first instance. And
> in fact, it looks like (with my limited knowledge of semi-sync) that this
> suspicious case is exactly the AFTER_SYNC which fails, while AFTER_COMMIT
> works...
>
> So maybe try the below patch?
>
> Pavel, what do you think, do you agree that this patch should be better?
>
>  - Kristian.
>
>
> diff --git a/sql/log.cc b/sql/log.cc
> index 7efec98..b77a6b3 100644
> --- a/sql/log.cc
> +++ b/sql/log.cc
> @@ -7712,7 +7712,7 @@ MYSQL_BIN_LOG::trx_group_commit_leader(group_commit_entry *leader)
>        last= current->next == NULL;
>        if (!current->error &&
>            RUN_HOOK(binlog_storage, after_sync,
> -                   (current->thd, log_file_name,
> +                   (current->thd, current->cache_mngr->last_commit_pos_file,
>                      current->cache_mngr->last_commit_pos_offset,
>                      first, last)))
>        {
>

Follow ups

References