← Back to team overview

maria-discuss team mailing list archive

Re: Semi-sync replication hangs when changing binlog filename.

 

Hi,

Does the problem appear if you set the timeout value to 9223372036854775807?


On Fri, Jul 29, 2016 at 3:24 AM, Joseph Glanville <jpg@xxxxxxxxx> wrote:

> Hi Pavel.
>
>
> To describe the setup a little better the master replicates to a semi-sync
> slave, which then replicates to an async slave. This is to ensure at any
> point in time both the master and the semi-sync slave have a complete copy
> of the data. If the master fails the semi-sync is automatically promoted to
> master and the async switches to replicating with semi-sync replication. If
> the semi-sync fails then the async remasters itself to the master and
> switches to semi-sync.
>
>
> However I don't think the 3rd node has any bearing on the hang, I built a
> test cluster without it and the hang is still easy to reproduce. I just
> restore a decent sized dump, in this case a portion of the Wikipedia
> database and the cluster reliably hangs when the master begins writing to
> the new binlog.
>
> The dump is here if someone wants to use it to reproduce:
> https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-category.sql.gz
>
>
> I have created a gist with the output of `SHOW STATUS LIKE
> 'Rpl_semi_sync%s'` on both master and slave of the simplified 2 node setup.
> I have also included the binlogs of both the master and the slave and the
> relay log on the slave.
>
> https://gist.github.com/josephglanville/70789bc9c3744090a17070652cded68b
>
>
> <https://gist.github.com/josephglanville/70789bc9c3744090a17070652cded68b>Let
> me know if there is any other useful information I can provide.
>
>
> Joseph.
> ------------------------------
> *From:* Pavel Ivanov <pivanof@xxxxxxxxxx>
> *Sent:* Friday, 29 July 2016 4:31:26 PM
> *To:* Joseph Glanville
> *Cc:* Will Fong; maria-discuss@xxxxxxxxxxxxxxxxxxx
> *Subject:* Re: [Maria-discuss] Semi-sync replication hangs when changing
> binlog filename.
>
> This looks pretty weird. If you don't mind more information would be
> useful to look at: contents of mariadb-bin.000005 on the master, in
> particular what GTID and binlog position the transaction waiting for
> semi-sync ack has (confirm that it's 0-1684280839-156 and ends at offset
> 329); result of "show status like 'rpl_semi_sync_%'" on both master and
> slave; contents of relay-bin.000005 and binlog on the slave, in particular
> did it really execute the transaction that is currently hanging on the
> master? Out of curiosity: it looks like the slave also acts as a master to
> someone else. Can you also verify that the transaction hanging now on the
> master made it to that second-level slave?
>
> But to be honest, I don't quite understand how what you show us could
> happen, so I'm just asking to look at the info that I would look at if I
> were investigating such problem.
>
> On Thu, Jul 28, 2016 at 10:52 PM, Joseph Glanville <jpg@xxxxxxxxx> wrote:
>
>> Hi Pavel.
>>
>> Yes, by “binlog filename changes” I mean the master begins writing to a
>> new binlog file.
>>
>> Output of all the requested commands are in this gist:
>> https://gist.github.com/josephglanville/7b96c34bb6e79ace33e56627672b98a5
>>
>> Joseph Glanville
>> Sent from Polymail
>> <https://polymail.io/?utm_source=polymail&utm_medium=referral&utm_campaign=signature>
>>
>>
>> On Fri, 29 Jul 2016 at 3:08 PM Pavel Ivanov <Pavel Ivanov
>> <Pavel+Ivanov+%3Cpivanof@xxxxxxxxxx%3E>> wrote:
>>
>>> By "binlog filename changes" you mean when master starts writing binlogs
>>> into a new file? Can you clarify how the replication stalls? What "show
>>> processlist" shows at that time on master and on slave? What does "show
>>> slave status" show on the slave? On Thu, Jul 28, 2016 at 10:03 PM, Will
>>> Fong wrote: > Hi Joseph, > > On Fri, Jul 29, 2016 at 10:11 AM, Joseph
>>> Glanville wrote: >> However whenever the binlog filename changes the
>>> replication stalls >> indefinitely. > > Interesting! I may have reproduced
>>> this, but it was only a quick test. > Let me (or someone else) dig into
>>> this more. > > Thanks for reporting this. > -will > > > -- > Will Fong,
>>> Senior Support Engineer > MariaDB Corporation > >
>>> _______________________________________________ > Mailing list:
>>> https://launchpad.net/~maria-discuss > Post to :
>>> maria-discuss@xxxxxxxxxxxxxxxxxxx > Unsubscribe :
>>> https://launchpad.net/~maria-discuss > More help :
>>> https://help.launchpad.net/ListHelp
>>>
>>
>>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~maria-discuss
> Post to     : maria-discuss@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~maria-discuss
> More help   : https://help.launchpad.net/ListHelp
>
>

Follow ups

References