← Back to team overview

maria-developers team mailing list archive

Re: Review of patch for MDEV-4820

 

> Ok. Actually, I think we should expose the real binlog state (what is stored
> in the Gtid_list event at the start of the binlog). So something like a
> variable
>
>   @@GLOBAL.gtid_binlog_state
>
> Example value: '0-1-100,0-2-101'
>
> And you get an error if you set it unless the binlog is empty.
>
> Would this be what you need?

Yep, sounds like what we need.

Thanks,
Pavel


On Mon, Aug 19, 2013 at 4:28 AM, Kristian Nielsen
<knielsen@xxxxxxxxxxxxxxx> wrote:
> Pavel Ivanov <pivanof@xxxxxxxxxx> writes:
>
>> I think to fix this bug we should stop using gtid_slave_pos as
>> indication of the current db state. We should make it possible to
>
> Agree.
>
>> change gtid_binlog_pos when there's no events in binlogs. And when
>
> Ok. Actually, I think we should expose the real binlog state (what is stored
> in the Gtid_list event at the start of the binlog). So something like a
> variable
>
>   @@GLOBAL.gtid_binlog_state
>
> Example value: '0-1-100,0-2-101'
>
> And you get an error if you set it unless the binlog is empty.
>
> Would this be what you need?
>
>> it kind of makes sense more than using gtid_slave_pos. But probably
>> this will break the detection of slaves trying to connect using GTID
>> before the start of binlogs...
>
> I do not think it will break that (but we will see).
>
>> 5. Completely from different area but also GTID related bug. Take
>> database from previous MySQL version (I've tested on the database from
>> 5.1), start MariaDB on it, run mysql_upgrade and then try to set
>> gtid_slave_pos to something. At this point I've got error "unable to
>> load slave state from gtid_slave_pos table". This error was apparently
>> remembered from MariaDB's start and reading of gtid_slave_pos table
>> wasn't retried after mysql_upgrade actually created it.
>
> Ok, I will take a look. I think there is an existing bug report on that. IIRC
> there is some locking issue (the variable can be accessed from a place where
> table locks cannot be taken to read gtid_slave_pos table), but I will see what
> can be done.
>
>> 1. When master doesn't have binlogs and gtid_slave_pos is ahead of the
>> GTID that slave tries to connect with you give error "The binlog on
>> the master is missing the GTID ... requested by the slave (even though
>> both a prior and a subsequent number does exist), and GTID strict mode
>> is enabled". I find this error message very confusing: presence of a
>> subsequent GTID in such situation is questionable, but there is no
>> prior GTID in master's binlog for sure.
>
> Hm, this sounds like a bug. Do you have a testcase?
>
> But with @@GLOBAL.gtid_binlog_state implemented and set correctly, you will
> get instead the correct error message, that the position that the slave
> requests to connect at has been purged from the master's binlog.
>
>> 2. The error message "An attempt was made to binlog GTID ... which
>> would create an out-of-order sequence number with existing GTID ...,
>> and gtid strict mode is enabled" is confusing too, because it's issued
>> not when slave actually tries to write event to binlog. Apparently the
>> error condition is checked when slave considers executing the event
>> that was just received from master. And if this event contains changes
>> only to tables matching replicate-wild-ignore-table filter then this
>> event won't be ever binlog'ed on slave in non-strict mode. So there's
>> no "attempt to binlog" involved and error wording becomes not quite
>> understandable.
>
> Right, I see. Thanks!
>
> One problem here is that when using non-transactional (DDL or MyISAM), then we
> _do_ need to check this _before_ executing the event. Because we cannot roll
> back after the event.
>
> But I agree of course that this is a bug. I will try to find a way to
> fix. Maybe the check can be delayed until the first event that we are actually
> going to execute (not filter).
>
>
>> 3. There's error message "Specified GTID ... conflicts with the binary
>> log which contains a more recent GTID .... If
>> MASTER_GTID_POS=CURRENT_POS is used, the binlog position will override
>> the new value of @@gtid_slave_pos". It looks like it's issued
>> inconsistently. I had in binlog empty Gtid_list, then 0-1-26, 0-1-27,
>> 0-1-28, 0-2-29 and 0-2-30. And both gtid_slave_pos and gtid_binlog_pos
>> were set to '0-2-30'. In this situation I was able to set
>> gtid_slave_pos to '0-1-29' successfully and get "slave has diverged"
>> error after START SLAVE. Then I was able to set gtid_slave_pos to
>> '0-2-29' and get error "Attempt was made to binlog out-of-order" after
>> START SLAVE.
>> I'd think that at least in strict mode MariaDB shouldn't allow to set
>> gtid_slave_pos to a value that is clearly in the past.
>
> Right, thanks, I will check. (I can understand that 0-1-29 did not give error,
> though you are probably right that it should; but that 0-2-29 did not give
> error is surprising).
>
>> 4. Now real bug. Start three servers S1, S2 and S3 without binlogs.
>> Set gtid_slave_pos to the same value on all of them. Connect S2 to
>> replicate from S1. Execute a few transactions on S1. Perform a
>> failover, make S1 to replicate from S2. Now connect S3 to replicate
>> from S2. At this point S3 should be able to replicate successfully
>> because it has the same db state as S2 had in the beginning (S3 has
>> the same gtid_slave_pos as S2 had initially), and S2 has all binlogs
>> to move from current position on S3 to the current position on S2. But
>> yet S3 gets error that starting GTID doesn't exist in S2's binlogs.
>
> This should also be fixed by setting @@GLOBAL.gtid_binlog_state.
>
>  - Kristian.


Follow ups

References