maria-discuss team mailing list archive

Thread
Date
Re: Can't find log after Galera's SST

To: Andrew Garner <muzazzi@xxxxxxxxx>
From: Igor Mazur <igor.kozlov@xxxxxxxxx>
Date: Thu, 14 Apr 2016 12:01:15 +0300
Cc: MariaDB discuss <maria-discuss@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <CABcaCLwjR2HiXireBKOcjRUWD1NQa5iQOhrASnLSb4kK4sQK9A@mail.gmail.com>
Great! Thank you! I've just found this issue at Jira.

Will try to understand more. What is strange - sometimes it synchronizes,
but mostly - don't.

It might be related to the last version of MariaDB 10.1.13
But, I remember, that I had some similar issue last summer (there was MariaDB
10.1.8) - but at that time, simple restart helped and I forgot about an
issue.

2016-04-14 11:28 GMT+03:00 Andrew Garner <muzazzi@xxxxxxxxx>:

> Sorry for the noise.
>
> XtraBackup does flush binary logs and does copy the last binary log
> into the backup when gtids are detected (i.e. always on mariadb).
> That's what's rotating your logs and it does only copy the last log
> (post-rotate).   I don't think that's configurable at all.
>
> I was able to reproduce this in my environment with identical results
> to yours.   MariaDB 10.1.13 / Xtrabackup 2.3.4 on Debian 8.  I just
> started a busy loop inserting a row constantly into a table and forced
> SST:
>
> [Note] Recovering after a crash using /var/log/mysql/mysql-bin
> [ERROR] Binlog file '/var/log/mysql/mysql-bin.000149' not found in
> binlog index, needed for recovery. Aborting.
> [ERROR] Crash recovery failed. Either correct the problem (if it's,
> for example, out of memory error and restart, or delete (or rename)
> binary log and start mysqld with
> --tc-heuristic-recover={commit|rollback}
> [ERROR] Can't init tc log
> ERROR] Aborting
>
> Actual binlog changes each time while i'm writing to the cluster.
>
> I did find that commenting out the bits from wsrep_sst_xtrabackup-v2
> that copies the binlog back to /var/log/mysql/ avoids this error - I
> guess the presence of a binlog here starts the recovery process.  I'm
> not sure what the ramifications are of skipping that recovery.
>
> This issue looks like it's already reported here:
> https://jira.mariadb.org/browse/MDEV-9423
>
> I will add my comments there and hope we can find a solution.
>
>
> On Wed, Apr 13, 2016 at 11:50 PM, Igor Mazur <igor.kozlov@xxxxxxxxx>
> wrote:
> > Sorry, I've forgotten to say that I use xtrabackup.
> >
> > 2016-04-14 2:04 GMT+03:00 Andrew Garner <muzazzi@xxxxxxxxx>:
> >>
> >> What SST method are you using? Based on this I expect you are using
> rsync.
> >>
> >> I imagine someone knows more detailed answers here, but from my
> >> understanding:
> >>
> >> Internally the bin/wsrep_sst_rsync script indicates it needs "flush
> >> tables" to the server and the server runs this "sst_flush_tables()"
> >> function that effectively runs:
> >>
> >> FLUSH TABLES WITH READ LOCK
> >> + reload_acl_and_cache(thd, REFRESH_ENGINE_LOG | REFRESH_BINARY_LOG,
> ...)
> >>
> >> Which I think is equivalent to FLUSH ENGINE LOGS / FLUSH BINARY LOGS.
> >> I think around this point it also sets innodb_disallow_writes to stop
> >> InnoDB's background I/O (so rsync can be consistent).
> >>
> >> At that point the bin/wsrep_sst_rsync script graps the last
> >> $BINLOG_N_FILES binary logs, which I believe is hardcoded to '1' and
> >> seems to be evaluated after this flush condition.   But internally I
> >> guess InnoDB can have a prepared transaction related to a previous
> >> binlog.
> >>
> >> I've wrestled with similar problems writing backup recovery routines
> >> for MariaDB + snapshot backups (typically via LVM snapshots), where I
> >> typically have to go 2 logs back and it seems to eliminate this kind
> >> of failure.  You might be able to bump up BINLOG_N_FILES in
> >> /usr/bin/wsrep_sst_rsync (or wherever it is located) and see if that
> >> resolves your race condition.
> >>
> >> Alternatively I think percona-xtrabackup has solved this problem[1] a
> >> while back as well, so a different wsrep-sst-method may be an option.
> >>  Although xtrabackup has its own problems - in particularly I've found
> >> that xtrabackup currently doesn't handle lost+found directories too
> >> well[2], so you either have to avoid having the datadir directly under
> >> the mountpoint or use some other filesystem (such as XFS) that doesn't
> >> create lost+found.  I suppose if you're using encryption, xtrabackup
> >> is also not currently an option. :)
> >>
> >> ~Andrew
> >>
> >> [1] https://bugs.launchpad.net/percona-xtrabackup/+bug/1254227
> >> [2] https://bugs.launchpad.net/percona-xtrabackup/+bug/1272329
> >>
> >> On Wed, Apr 13, 2016 at 4:27 PM, Igor Mazur <igor.kozlov@xxxxxxxxx>
> wrote:
> >> > Hi,
> >> >
> >> > Almost everytime when I join node to Galera cluster (MariaDB 10.1.13)
> >> > I'm
> >> > getting an error after SST:
> >> >
> >> > 2016-04-13 21:09:37 139932281419712 [Note] Recovering after a crash
> >> > using
> >> > /var/log/mysql/mariadb-bin
> >> > 2016-04-13 21:09:37 139932281419712 [ERROR] Binlog file
> >> > '/var/log/mysql/mariadb-bin.000023' not found in binlog index, needed
> >> > for
> >> > recovery. Aborting.
> >> > 2016-04-13 21:09:37 139932281419712 [ERROR] Crash recovery failed.
> >> > Either
> >> > correct the problem (if it's, for example, out of memory error) and
> >> > restart,
> >> > or delete (or rename) binary log and start mysqld with
> >> > --tc-heuristic-recover={commit|rollback}
> >> > 2016-04-13 21:09:37 139932281419712 [ERROR] Can't init tc log
> >> > 2016-04-13 21:09:37 139932281419712 [ERROR] Aborting
> >> >
> >> > As I see, almost every SST rotates a log file and downloads last (in
> >> > example
> >> > above I have mariadb-bin.000024), and not previous. But for recovering
> >> > it
> >> > needs previous.
> >> > Sometimes I'm lucky and it restores.
> >> >
> >> > Anyone knows why it rotates a log and is it possible to force it to
> load
> >> > more than last log file?
> >> >
> >> > Igor Mazur
> >> >
> >> > _______________________________________________
> >> > Mailing list: https://launchpad.net/~maria-discuss
> >> > Post to     : maria-discuss@xxxxxxxxxxxxxxxxxxx
> >> > Unsubscribe : https://launchpad.net/~maria-discuss
> >> > More help   : https://help.launchpad.net/ListHelp
> >> >
> >
> >
>
References

Can't find log after Galera's SST
From: Igor Mazur, 2016-04-13
Re: Can't find log after Galera's SST
From: Andrew Garner, 2016-04-13
Re: Can't find log after Galera's SST
From: Igor Mazur, 2016-04-14
Re: Can't find log after Galera's SST
From: Andrew Garner, 2016-04-14