← Back to team overview

maria-discuss team mailing list archive

Re: Semi-sync replication hangs when changing binlog filename.


Pavel Ivanov <pivanof@xxxxxxxxxx> writes:

> binlog ending at position mariadb-bin.000004:2039896, somehow the
> function ReplSemiSyncMaster::commitTrx() gets trx_wait_binlog_name =
> 'mariadb-bin.000005' and trx_wait_binlog_pos = 2039896. I.e. the
> function gets the position of the transaction to wait semi-sync ack
> for correctly, but the file name is already the one that is current
> after rotation. Master starts waiting for that position, but the slave

> Kristian, do you have any idea what's going on? Is there an
> inappropriate lock release/re-acquire somewhere?

Hm. Actually, looking into MYSQL_BIN_LOG::trx_group_commit_leader, this
looks suspicious:

    RUN_HOOK(binlog_storage, after_flush,
         current->cache_mngr->last_commit_pos_offset, synced,
         first, last))


    RUN_HOOK(binlog_storage, after_sync,
             (current->thd, log_file_name,
              first, last))

I would have expected that `log_file_name' to be also
current->cache_mngr->last_commit_pos_file, like in the first instance. And
in fact, it looks like (with my limited knowledge of semi-sync) that this
suspicious case is exactly the AFTER_SYNC which fails, while AFTER_COMMIT

So maybe try the below patch?

Pavel, what do you think, do you agree that this patch should be better?

 - Kristian.

diff --git a/sql/log.cc b/sql/log.cc
index 7efec98..b77a6b3 100644
--- a/sql/log.cc
+++ b/sql/log.cc
@@ -7712,7 +7712,7 @@ MYSQL_BIN_LOG::trx_group_commit_leader(group_commit_entry *leader)
       last= current->next == NULL;
       if (!current->error &&
           RUN_HOOK(binlog_storage, after_sync,
-                   (current->thd, log_file_name,
+                   (current->thd, current->cache_mngr->last_commit_pos_file,
                     first, last)))

Follow ups