← Back to team overview

maria-developers team mailing list archive

Re: Documentation about GTID

 

> So it seems that once you forbid logging of duplicate sequence numbers
> (making "strict" mode the only mode, at least in this context), you can
> safely discard server ID from the GTID, no?

No. Consider your example, but now at 0-0-100 you've got network
partition and N1 got disconnected from N0. Then each server executes
one transaction, it will be 0-0-101 on N0 and 0-1-101 on N1. Now
network is restored and N1 connects to N0 again. If GTID consisted of
domain_id and sequence number only then N1 would ask to start
replication from 0-101 and N0 will happily do that and will send
events starting from 0-0-102. So you'll get a silent loss of one
transaction. It shouldn't happen. With current implementation N1 will
ask to start replication from 0-1-101, N0 won't find such event and
will give an error, which is The Right Thing To Do in such situation.


Pavel


On Tue, May 7, 2013 at 7:20 PM, Alex Yurchenko
<alexey.yurchenko@xxxxxxxxxxxxx> wrote:
> Hi Pavel,
>
> Thanks for explanation. I was following your discussion about "strict" mode
> as well. So I guess I can formulate my concerns more or less compactly:
>
> 1) Although server ID allows us to distinguish between 0-0-101 and 0-1-101
> in the example below, it does not seem to be used for anything useful, but
> to log events with duplicate sequence numbers within a given domain ID,
> which, as we know, should not be there.
>
> 2) Even the ability to distinguish between 0-0-101 and 0-1-101 seems to not
> work consistently and be a matter of luck since either one can get lost
> after a couple of failovers. Why bother at all?
>
> So it seems that once you forbid logging of duplicate sequence numbers
> (making "strict" mode the only mode, at least in this context), you can
> safely discard server ID from the GTID, no?
>
> Regards,
> Alex
>
>
> On 2013-05-08 02:02, Pavel Ivanov wrote:
>>
>> Helping Kristian to answer questions (see below). He can elaborate if
>> he wish to.
>>
>> On Tue, May 7, 2013 at 12:52 PM, Alex Yurchenko
>> <alexey.yurchenko@xxxxxxxxxxxxx> wrote:
>>>
>>> On 2013-05-07 17:13, Kristian Nielsen wrote:
>>>>
>>>>
>>>> Alex Yurchenko <alexey.yurchenko@xxxxxxxxxxxxx> writes:
>>>>
>>>>> From the documentation the purpose of domain ID in GTID is quite
>>>>> clear. But what is the role of server ID?
>>>>
>>>>
>>>>
>>>> The role is mainly to ensure uniqueness of GTID when domain_id is not
>>>> configured correctly.
>>>
>>>
>>>
>>> Since both are configured manually, and domain ID can simply default to 0
>>> in
>>> simple setups, I'd imagine that the possibility of having server ID
>>> configured incorrectly (just missing to configure it) is way more
>>> probable
>>> than having domain ID incorrect. Actually, being an arbitrary node group
>>> ID
>>> what is "incorrect" here? ANY value just makes the node a member of the
>>> corresponding domain, so ANY domain ID value is legal. Whereas server ID
>>> can
>>> certainly be incorrect (a duplicate within the domain).
>>
>>
>> Incorrect in this case would be having multi-master replication with
>> independent replication streams having the same domain_id.
>>
>>>> Replication already requires server ID to be unique, so (server_id,
>>>> sequence
>>>> number) will be globally unique as long as sequence number is increased
>>>> locally on each server.
>>>>
>>>> Domain id is not required to be unique, in fact it will typically be
>>>> shared by
>>>> master and slave. It is a common mistake to do a manual transaction on
>>>> the
>>>> slave while transactions are also being done at the same time on the
>>>> master. Having server_id in the GTID prevents that two different
>>>> transactions
>>>> end up with the same GTID.
>>>
>>>
>>>
>>> So suppose we have nodes N0, N1 and N2 with IDs 0-0, 0-1, 0-2
>>> respectively.
>>>
>>> Initially N1 and N2 both replicate from N0 and have identical DB
>>> contents.
>>>
>>> At 0-0-10 N2 goes to maintenance.
>>>
>>> After 0-0-100 someone executes local transaction on N1 and it gets logged
>>> as
>>> 0-1-101. Right?
>>
>>
>> Right.
>>
>>> So if now N0 executes another transaction, what will be its GTID?
>>> a) on N0 - 0-0-101?
>>
>>
>> Correct.
>>
>>> b) on N1 - 0-0-102 or 0-0-101?
>>> (as your documentation states
>>
>>
>> On N1 it will be the same -- 0-0-101. The purpose of GTID is that the
>> same transaction has the same GTID on each server.
>>
>>>>   The server ID is set to the server ID of the server where the event
>>>> group is first logged into the binlog. The sequence number is increased
>>>> on a
>>>> server for every event group logged.
>>>
>>>
>>> So it is actually another question, sequence number is not set on the
>>> master
>>> server but always computed locally?)
>>
>>
>> For every transaction replicated from master sequence number is set on
>> the master. For each transaction executed on the slave locally
>> sequence number is generated locally.
>>
>>> Or, does N1 detect a problem at this point? If yes, how exactly?
>>
>>
>> N1 doesn't detect a problem currently, but Kristian plans to implement
>> a "strict gtid mode" when N1 will detect a problem.
>>
>>> How server ID is involved there?
>>
>>
>> Server ID is involved in a sense that there's no confusion when you
>> talk about transaction with sequence number 101 in your example above.
>> There are two transactions with sequence number 101 -- 0-0-101 and
>> 0-1-101. So server ID gives distinct GTID to these transactions
>> despite the same sequence number.
>>
>>> Now if we can get past this point without an error at N1, and start N2 to
>>> replicate from N1, I take it will receive and commit 0-1-101. But will it
>>> ever record it somewhere or its state will be simply 0-0-XXX?
>>
>>
>> Yes, N2 will receive transaction 0-1-101 and it will put it in binlogs
>> as 0-1-101, the same way as it was on N1.
>>
>>> If at 0-0-110
>>> we failover N0 to replicate from N2, will it receive 0-1-101?
>>
>>
>> No, if you failover when both N0 and N2 have already 0-0-110 then N0
>> won't receive 0-1-101 because it will start replication from the first
>> transaction after 0-0-110.
>> OTOH if at 0-0-110 you take N0 out, restore it to the pre- 0-1-101
>> state and connect to replicate from N2 then N0 will receive 0-1-101.
>>
>>
>> Pavel
>
>
> --
> Alexey Yurchenko,
> Codership Oy, www.codership.com
> Skype: alexey.yurchenko, Phone: +358-400-516-011


References