maria-developers team mailing list archive

Thread
Date

Re: GTID replication and relay_log.info

To: Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
From: Pavel Ivanov <pivanof@xxxxxxxxxx>
Date: Mon, 27 May 2013 15:22:29 -0700
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx
In-reply-to: <87ppwc4wjx.fsf@frigg.knielsen-hq.org>

Thanks for the pointers, Kristian. It turned out that it was an incorrect
merge of GTID-related changes into 10.0.1 tree. In particular I didn't
quite understand the meaning of LINE_FOR_FIRST_MYSQL_5_6 in sql/rpl_mi.cc
and thus left it at the value of 19. Because of that every time when
master.info was read slave didn't see the value of using_gtid variable and
didn't purge relay logs. Now I've got this fixed for us.

BTW, while debugging I've noticed that master.info file wasn't properly
truncated when it was re-written. I always saw some left-over text after
the line "using_gtid=1". After finish of rpl.rpl_gtid_crash it could be
even several lines of extra text with one more line "using_gtid=1" at the
end. So probably you will want to fix this at some point...


Pavel


On Mon, May 27, 2013 at 2:11 AM, Kristian Nielsen
<knielsen@xxxxxxxxxxxxxxx>wrote:

> Pavel Ivanov <pivanof@xxxxxxxxxx> writes:
>
> > See logs in attachment. It looks clear to me that relay logs get replayed
> > after crash and don't get deleted. I'm not sure though if the reason of
> > that can be seen in the logs.
>
> No, I agree it's not clear from the logs one way or the other, so good that
> you mention this.
>
> > Can you point me to the code where relay logs are deleted? I'll try to
> > check why it's not called...
>
> It is this, in slave.cc start_slave_threads():
>
>   if (mi->using_gtid != Master_info::USE_GTID_NO &&
>       !mi->slave_running && !mi->rli.slave_running)
>   {
>     purge_relay_logs(&mi->rli, NULL, 0, &errmsg);
>     mi->master_log_name[0]= 0;
>     mi->master_log_pos= 0;
>   }
>
> When both slave IO and SQL threads are stopped, we purge the relay logs
> before
> starting. And surely they must be stopped when we first start them after a
> crash ...
>
> And then in get_master_version_and_clock(), if we deleted the relay logs
> then
> we start from the GTID position:
>
>   if (mi->using_gtid != Master_info::USE_GTID_NO &&
> !mi->master_log_name[0])
>   {
>     ...
>
> I would suggest printouts to the error log around those two places in the
> code
> and check that both are executed as expected at startup. Maybe either if()
> condition becomes false for some reason, or purge_relay_logs() fails to
> purge,
> but I do not see how from the code at the moment.
>
> It would also help if I had implemented MDEV-4491
> (https://mariadb.atlassian.net/browse/MDEV-4491), to give better
> printouts in
> the error log about where and how the slave is actually connecting to the
> master. I will try to get this done soon.
>
> One thing did spring to mind as I was looking at this code, the XtraDB
> --innodb_recovery_update_relay_log option. This can overwrite relay-log
> information during crash recovery, so might be able to interfere. But it is
> off by default and does printouts to error log that are not seen in your
> logs,
> so it seems this is not related ...
>
> Thanks for helping look into this. I actually plan to change this code to
> something more robust, but it would be good to find the real problem here,
> otherwise I may end up just hiding the bug, not fixing it ...
>
>  - Kristian.
>

Follow ups

Re: GTID replication and relay_log.info
From: Kristian Nielsen, 2013-05-27

References

GTID replication and relay_log.info
From: Pavel Ivanov, 2013-05-21
Re: GTID replication and relay_log.info
From: Kristian Nielsen, 2013-05-21
Re: GTID replication and relay_log.info
From: Pavel Ivanov, 2013-05-26
Re: GTID replication and relay_log.info
From: Kristian Nielsen, 2013-05-26
Re: GTID replication and relay_log.info
From: Pavel Ivanov, 2013-05-26
Re: GTID replication and relay_log.info
From: Pavel Ivanov, 2013-05-27
Re: GTID replication and relay_log.info
From: Kristian Nielsen, 2013-05-27