← Back to team overview

maria-developers team mailing list archive

Re: Review of patch for MDEV-4820

 

Pavel Ivanov <pivanof@xxxxxxxxxx> writes:

> I think to fix this bug we should stop using gtid_slave_pos as
> indication of the current db state. We should make it possible to

Agree.

> change gtid_binlog_pos when there's no events in binlogs. And when

Ok. Actually, I think we should expose the real binlog state (what is stored
in the Gtid_list event at the start of the binlog). So something like a
variable

  @@GLOBAL.gtid_binlog_state

Example value: '0-1-100,0-2-101'

And you get an error if you set it unless the binlog is empty.

Would this be what you need?

> it kind of makes sense more than using gtid_slave_pos. But probably
> this will break the detection of slaves trying to connect using GTID
> before the start of binlogs...

I do not think it will break that (but we will see).

> 5. Completely from different area but also GTID related bug. Take
> database from previous MySQL version (I've tested on the database from
> 5.1), start MariaDB on it, run mysql_upgrade and then try to set
> gtid_slave_pos to something. At this point I've got error "unable to
> load slave state from gtid_slave_pos table". This error was apparently
> remembered from MariaDB's start and reading of gtid_slave_pos table
> wasn't retried after mysql_upgrade actually created it.

Ok, I will take a look. I think there is an existing bug report on that. IIRC
there is some locking issue (the variable can be accessed from a place where
table locks cannot be taken to read gtid_slave_pos table), but I will see what
can be done.

> 1. When master doesn't have binlogs and gtid_slave_pos is ahead of the
> GTID that slave tries to connect with you give error "The binlog on
> the master is missing the GTID ... requested by the slave (even though
> both a prior and a subsequent number does exist), and GTID strict mode
> is enabled". I find this error message very confusing: presence of a
> subsequent GTID in such situation is questionable, but there is no
> prior GTID in master's binlog for sure.

Hm, this sounds like a bug. Do you have a testcase?

But with @@GLOBAL.gtid_binlog_state implemented and set correctly, you will
get instead the correct error message, that the position that the slave
requests to connect at has been purged from the master's binlog.

> 2. The error message "An attempt was made to binlog GTID ... which
> would create an out-of-order sequence number with existing GTID ...,
> and gtid strict mode is enabled" is confusing too, because it's issued
> not when slave actually tries to write event to binlog. Apparently the
> error condition is checked when slave considers executing the event
> that was just received from master. And if this event contains changes
> only to tables matching replicate-wild-ignore-table filter then this
> event won't be ever binlog'ed on slave in non-strict mode. So there's
> no "attempt to binlog" involved and error wording becomes not quite
> understandable.

Right, I see. Thanks!

One problem here is that when using non-transactional (DDL or MyISAM), then we
_do_ need to check this _before_ executing the event. Because we cannot roll
back after the event.

But I agree of course that this is a bug. I will try to find a way to
fix. Maybe the check can be delayed until the first event that we are actually
going to execute (not filter).


> 3. There's error message "Specified GTID ... conflicts with the binary
> log which contains a more recent GTID .... If
> MASTER_GTID_POS=CURRENT_POS is used, the binlog position will override
> the new value of @@gtid_slave_pos". It looks like it's issued
> inconsistently. I had in binlog empty Gtid_list, then 0-1-26, 0-1-27,
> 0-1-28, 0-2-29 and 0-2-30. And both gtid_slave_pos and gtid_binlog_pos
> were set to '0-2-30'. In this situation I was able to set
> gtid_slave_pos to '0-1-29' successfully and get "slave has diverged"
> error after START SLAVE. Then I was able to set gtid_slave_pos to
> '0-2-29' and get error "Attempt was made to binlog out-of-order" after
> START SLAVE.
> I'd think that at least in strict mode MariaDB shouldn't allow to set
> gtid_slave_pos to a value that is clearly in the past.

Right, thanks, I will check. (I can understand that 0-1-29 did not give error,
though you are probably right that it should; but that 0-2-29 did not give
error is surprising).

> 4. Now real bug. Start three servers S1, S2 and S3 without binlogs.
> Set gtid_slave_pos to the same value on all of them. Connect S2 to
> replicate from S1. Execute a few transactions on S1. Perform a
> failover, make S1 to replicate from S2. Now connect S3 to replicate
> from S2. At this point S3 should be able to replicate successfully
> because it has the same db state as S2 had in the beginning (S3 has
> the same gtid_slave_pos as S2 had initially), and S2 has all binlogs
> to move from current position on S3 to the current position on S2. But
> yet S3 gets error that starting GTID doesn't exist in S2's binlogs.

This should also be fixed by setting @@GLOBAL.gtid_binlog_state.

 - Kristian.


Follow ups

References