maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #09770
Re: MDEV-9423: FTWRL and Binlog checkpoint
-
To:
Nirbhay Choubey <nirbhay@xxxxxxxxxxx>
-
From:
Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
-
Date:
Sat, 25 Jun 2016 09:57:36 +0200
-
Cc:
MariaDB Developers <maria-developers@xxxxxxxxxxxxxxxxxxx>
-
In-reply-to:
<CACAc7VknHdd6UGK40ZTqTp0rnEPzakNObp=a=uNo9M_Ks+_OZQ@mail.gmail.com> (Nirbhay Choubey's message of "Fri, 24 Jun 2016 11:29:50 -0400")
-
User-agent:
Gnus/5.13 (Gnus v5.13) Emacs/24.4 (gnu/linux)
Nirbhay Choubey <nirbhay@xxxxxxxxxxx> writes:
>> Also, it seems reasonable that FTWRL in general could wait for checkpoint
>> events so that other backup mechanisms similarly could avoid binlog files
> That sound good to me. But, considering Percona's backup locks, it seems
> more logical to
> implement this in Backup locks instead, whenever they get
> ported/implemented in MariaDB.
Right. As I was thinking about the problem, it occured to me that this
wasn't really a Galera-specific thing, my suggestion seemed a valid general
wait-for-checkpoint mechanism.
So we should put the code that waits for checkpoint in its own function
(as you already did, MYSQL_BIN_LOG::wait_for_last_checkpoint_event()). But I
agree, we can wait with actually exposing it (in FTWRL, backup locks,
whatever) until when/if that becomes relevant/priority.
I would just note that this wait does not really do anything unless there is
something else (like FTWRL in your case) that prevents new commits,
otherwise a new checkpoint could become pending at any time after
wait_for_last_checkpoint_event() returns.
> Also, in this particular case, the problem lies
> in reload_acl_and_cache(REFRESH_BINARY_LOG),
> (executed after FTWRL while preparing for SST) that rotates the binary log.
Hm, I see. So you're always copying an empty binlog file? I'm wondering why
you don't simply don't copy any binlogs and just start the new server with
--tc-heuristic-recover=ROLLBACK ... maybe copying binlogs was just
considered easier? Anyway, I don't have the bigger picture, so can't have
much of an informed opinion here.
> Yes, it worked. But, to solve this issue in 10.1, I have added this wait to
> REFRESH_BINARY_LOG
> (as explained above) only when the server is acting as a Galera node.
That seems quite ugly, why not call it from the SST code, after it has
called reload_acl_and_cache()? You're basically making FLUSH LOGS behave
differently in Galera and non-Galera (if my understanding is correct), which
might lead to subtle bugs? But again, I don't have the bigger picture, and
the whole wsrep patch is garbage all over the server anyway, so I suppose it
doesn't matter much to me, as long as it's #ifdef WSREP.
>> and it also makes the extra lock/unlock of LOCK_log above redundant.
>
>
> Not quite. The wait logic (that includes LOCK_log, as the snippet above) is
> to pause
> REFRESH_BINARY_LOG and an additional use of LOCK_log to block the RESET/
> FLUSH commands while file transfer is in progress.
Sure, it's fine to have both, probably makes the code clearer anyway.
> --- a/sql/log.cc
> +++ b/sql/log.cc
> @@ -3690,7 +3690,10 @@ bool MYSQL_BIN_LOG::open(const char *log_name,
> new_xid_list_entry->binlog_id= current_binlog_id;
> /* Remove any initial entries with no pending XIDs. */
> while ((b= binlog_xid_count_list.head()) && b->xid_count == 0)
> + {
> my_free(binlog_xid_count_list.get());
> + mysql_cond_broadcast(&COND_xid_list);
> + }
> binlog_xid_count_list.push_back(new_xid_list_entry);
> mysql_mutex_unlock(&LOCK_xid_list);
There is no need to mysql_cond_broadcast() multiple times. Use just a single
broadcast outside the loop (before or after, doesn't make a difference).
- Kristian.
Follow ups
References