← Back to team overview

maria-developers team mailing list archive

Re: Review of patch for MDEV-4820

 

Krisitan,

Could you say are you working on these? Is there an ETA?
This is blocking us from pushing MariaDB into testing in the
near-production environment, and I'm hesitant to implement fixes
myself because I'd think you'll do it completely differently.

Thank you,
Pavel

On Mon, Aug 19, 2013 at 6:49 AM, Pavel Ivanov <pivanof@xxxxxxxxxx> wrote:
>> Ok. Actually, I think we should expose the real binlog state (what is stored
>> in the Gtid_list event at the start of the binlog). So something like a
>> variable
>>
>>   @@GLOBAL.gtid_binlog_state
>>
>> Example value: '0-1-100,0-2-101'
>>
>> And you get an error if you set it unless the binlog is empty.
>>
>> Would this be what you need?
>
> Yep, sounds like what we need.
>
> Thanks,
> Pavel
>
>
> On Mon, Aug 19, 2013 at 4:28 AM, Kristian Nielsen
> <knielsen@xxxxxxxxxxxxxxx> wrote:
>> Pavel Ivanov <pivanof@xxxxxxxxxx> writes:
>>
>>> I think to fix this bug we should stop using gtid_slave_pos as
>>> indication of the current db state. We should make it possible to
>>
>> Agree.
>>
>>> change gtid_binlog_pos when there's no events in binlogs. And when
>>
>> Ok. Actually, I think we should expose the real binlog state (what is stored
>> in the Gtid_list event at the start of the binlog). So something like a
>> variable
>>
>>   @@GLOBAL.gtid_binlog_state
>>
>> Example value: '0-1-100,0-2-101'
>>
>> And you get an error if you set it unless the binlog is empty.
>>
>> Would this be what you need?
>>
>>> it kind of makes sense more than using gtid_slave_pos. But probably
>>> this will break the detection of slaves trying to connect using GTID
>>> before the start of binlogs...
>>
>> I do not think it will break that (but we will see).
>>
>>> 5. Completely from different area but also GTID related bug. Take
>>> database from previous MySQL version (I've tested on the database from
>>> 5.1), start MariaDB on it, run mysql_upgrade and then try to set
>>> gtid_slave_pos to something. At this point I've got error "unable to
>>> load slave state from gtid_slave_pos table". This error was apparently
>>> remembered from MariaDB's start and reading of gtid_slave_pos table
>>> wasn't retried after mysql_upgrade actually created it.
>>
>> Ok, I will take a look. I think there is an existing bug report on that. IIRC
>> there is some locking issue (the variable can be accessed from a place where
>> table locks cannot be taken to read gtid_slave_pos table), but I will see what
>> can be done.
>>
>>> 1. When master doesn't have binlogs and gtid_slave_pos is ahead of the
>>> GTID that slave tries to connect with you give error "The binlog on
>>> the master is missing the GTID ... requested by the slave (even though
>>> both a prior and a subsequent number does exist), and GTID strict mode
>>> is enabled". I find this error message very confusing: presence of a
>>> subsequent GTID in such situation is questionable, but there is no
>>> prior GTID in master's binlog for sure.
>>
>> Hm, this sounds like a bug. Do you have a testcase?
>>
>> But with @@GLOBAL.gtid_binlog_state implemented and set correctly, you will
>> get instead the correct error message, that the position that the slave
>> requests to connect at has been purged from the master's binlog.
>>
>>> 2. The error message "An attempt was made to binlog GTID ... which
>>> would create an out-of-order sequence number with existing GTID ...,
>>> and gtid strict mode is enabled" is confusing too, because it's issued
>>> not when slave actually tries to write event to binlog. Apparently the
>>> error condition is checked when slave considers executing the event
>>> that was just received from master. And if this event contains changes
>>> only to tables matching replicate-wild-ignore-table filter then this
>>> event won't be ever binlog'ed on slave in non-strict mode. So there's
>>> no "attempt to binlog" involved and error wording becomes not quite
>>> understandable.
>>
>> Right, I see. Thanks!
>>
>> One problem here is that when using non-transactional (DDL or MyISAM), then we
>> _do_ need to check this _before_ executing the event. Because we cannot roll
>> back after the event.
>>
>> But I agree of course that this is a bug. I will try to find a way to
>> fix. Maybe the check can be delayed until the first event that we are actually
>> going to execute (not filter).
>>
>>
>>> 3. There's error message "Specified GTID ... conflicts with the binary
>>> log which contains a more recent GTID .... If
>>> MASTER_GTID_POS=CURRENT_POS is used, the binlog position will override
>>> the new value of @@gtid_slave_pos". It looks like it's issued
>>> inconsistently. I had in binlog empty Gtid_list, then 0-1-26, 0-1-27,
>>> 0-1-28, 0-2-29 and 0-2-30. And both gtid_slave_pos and gtid_binlog_pos
>>> were set to '0-2-30'. In this situation I was able to set
>>> gtid_slave_pos to '0-1-29' successfully and get "slave has diverged"
>>> error after START SLAVE. Then I was able to set gtid_slave_pos to
>>> '0-2-29' and get error "Attempt was made to binlog out-of-order" after
>>> START SLAVE.
>>> I'd think that at least in strict mode MariaDB shouldn't allow to set
>>> gtid_slave_pos to a value that is clearly in the past.
>>
>> Right, thanks, I will check. (I can understand that 0-1-29 did not give error,
>> though you are probably right that it should; but that 0-2-29 did not give
>> error is surprising).
>>
>>> 4. Now real bug. Start three servers S1, S2 and S3 without binlogs.
>>> Set gtid_slave_pos to the same value on all of them. Connect S2 to
>>> replicate from S1. Execute a few transactions on S1. Perform a
>>> failover, make S1 to replicate from S2. Now connect S3 to replicate
>>> from S2. At this point S3 should be able to replicate successfully
>>> because it has the same db state as S2 had in the beginning (S3 has
>>> the same gtid_slave_pos as S2 had initially), and S2 has all binlogs
>>> to move from current position on S3 to the current position on S2. But
>>> yet S3 gets error that starting GTID doesn't exist in S2's binlogs.
>>
>> This should also be fixed by setting @@GLOBAL.gtid_binlog_state.
>>
>>  - Kristian.


Follow ups

References