maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #06646
Need Review: MDEV-5509: Seconds_behind_master incorrect in parallel replication in http://bazaar.launchpad.net/~maria-captains/maria/10.0
-
To:
maria-developers@xxxxxxxxxxxxxxxxxxx
-
From:
Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
-
Date:
Wed, 08 Jan 2014 11:01:59 +0100
-
In-reply-to:
<E1W0pwK-0006MY-VE@frigg.knielsen-hq.org> (knielsen@knielsen-hq.org's message of "Wed, 08 Jan 2014 11:00:44 +0100")
-
User-agent:
Gnus/5.13 (Gnus v5.13) Emacs/23.4 (gnu/linux)
Could someone review this patch for MDEV-5509?
I am in particular worried that this might change some subtle behaviour of
Seconds_Behind_Master in unexpected ways. Of course I tried my best to make
sure such could not occur, but the semantics is really tricky and poorly
documented, and I do not think we have good coverage in the test suite.
So I would like to have one other developer check it carefully as well and see
if I've missed something.
- Kristian.
knielsen@xxxxxxxxxxxxxxx writes:
> At http://bazaar.launchpad.net/~maria-captains/maria/10.0
>
> ------------------------------------------------------------
> revno: 3959
> revision-id: knielsen@xxxxxxxxxxxxxxx-20140108100044-iszrbhddvxkfne71
> parent: knielsen@xxxxxxxxxxxxxxx-20140107105703-sn31yvf80batk1yx
> committer: knielsen@xxxxxxxxxxxxxxx
> branch nick: mariadb-10.0-base
> timestamp: Wed 2014-01-08 11:00:44 +0100
> message:
> MDEV-5509: Seconds_behind_master incorrect in parallel replication
>
> The problem was a race between the SQL driver thread and the worker threads.
> The SQL driver thread would set rli->last_master_timestamp to zero to
> mark that it has caught up with the master, while the worker threads would
> set it to the timestamp of the executed event. This can happen out-of-order
> in parallel replication, causing the "caught up" status to be overwritten
> and Seconds_Behind_Master to wrongly grow when the slave is idle.
>
> To fix, introduce a separate flag rli->sql_thread_caught_up to mark that the
> SQL driver thread is caught up. This avoids issues with worker threads
> overwriting the SQL driver thread status. In parallel replication, we then
> make SHOW SLAVE STATUS check in addition that all worker threads are idle
> before showing Seconds_Behind_Master as 0 due to slave idle.
> === added file 'mysql-test/suite/rpl/r/rpl_parallel2.result'
> --- a/mysql-test/suite/rpl/r/rpl_parallel2.result 1970-01-01 00:00:00 +0000
> +++ b/mysql-test/suite/rpl/r/rpl_parallel2.result 2014-01-08 10:00:44 +0000
> @@ -0,0 +1,18 @@
> +include/rpl_init.inc [topology=1->2]
> +*** MDEV-5509: Incorrect value for Seconds_Behind_Master if parallel replication ***
> +SET @old_parallel_threads=@@GLOBAL.slave_parallel_threads;
> +include/stop_slave.inc
> +SET GLOBAL slave_parallel_threads=5;
> +include/start_slave.inc
> +CREATE TABLE t1 (a INT PRIMARY KEY, b INT);
> +CALL mtr.add_suppression("Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. Statement is unsafe because it uses a system function that may return a different value on the slave");
> +INSERT INTO t1 VALUES (1,sleep(2));
> +Warnings:
> +Note 1592 Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. Statement is unsafe because it uses a system function that may return a different value on the slave.
> +Seconds_Behind_Master should be zero here because the slave is fully caught up and idle.
> +Seconds_Behind_Master = '0'
> +include/stop_slave.inc
> +SET GLOBAL slave_parallel_threads=@old_parallel_threads;
> +include/start_slave.inc
> +DROP TABLE t1;
> +include/rpl_end.inc
>
> === added file 'mysql-test/suite/rpl/t/rpl_parallel2.test'
> --- a/mysql-test/suite/rpl/t/rpl_parallel2.test 1970-01-01 00:00:00 +0000
> +++ b/mysql-test/suite/rpl/t/rpl_parallel2.test 2014-01-08 10:00:44 +0000
> @@ -0,0 +1,41 @@
> +--source include/have_binlog_format_statement.inc
> +--let $rpl_topology=1->2
> +--source include/rpl_init.inc
> +
> +--echo *** MDEV-5509: Incorrect value for Seconds_Behind_Master if parallel replication ***
> +
> +--connection server_2
> +SET @old_parallel_threads=@@GLOBAL.slave_parallel_threads;
> +--source include/stop_slave.inc
> +SET GLOBAL slave_parallel_threads=5;
> +--source include/start_slave.inc
> +
> +--connection server_1
> +CREATE TABLE t1 (a INT PRIMARY KEY, b INT);
> +CALL mtr.add_suppression("Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. Statement is unsafe because it uses a system function that may return a different value on the slave");
> +--save_master_pos
> +
> +--connection server_2
> +--sync_with_master
> +
> +--connection server_1
> +INSERT INTO t1 VALUES (1,sleep(2));
> +--save_master_pos
> +
> +--connection server_2
> +--sync_with_master
> +
> +--echo Seconds_Behind_Master should be zero here because the slave is fully caught up and idle.
> +--let $status_items= Seconds_Behind_Master
> +--source include/show_slave_status.inc
> +
> +
> +--connection server_2
> +--source include/stop_slave.inc
> +SET GLOBAL slave_parallel_threads=@old_parallel_threads;
> +--source include/start_slave.inc
> +
> +--connection server_1
> +DROP TABLE t1;
> +
> +--source include/rpl_end.inc
>
> === modified file 'sql/rpl_parallel.cc'
> --- a/sql/rpl_parallel.cc 2014-01-03 11:20:53 +0000
> +++ b/sql/rpl_parallel.cc 2014-01-08 10:00:44 +0000
> @@ -766,6 +766,27 @@ rpl_parallel::wait_for_done()
> }
>
>
> +bool
> +rpl_parallel::workers_idle()
> +{
> + struct rpl_parallel_entry *e;
> + uint32 i, max_i;
> +
> + max_i= domain_hash.records;
> + for (i= 0; i < max_i; ++i)
> + {
> + bool active;
> + e= (struct rpl_parallel_entry *)my_hash_element(&domain_hash, i);
> + mysql_mutex_lock(&e->LOCK_parallel_entry);
> + active= e->current_sub_id > e->last_committed_sub_id;
> + mysql_mutex_unlock(&e->LOCK_parallel_entry);
> + if (active)
> + break;
> + }
> + return (i == max_i);
> +}
> +
> +
> /*
> do_event() is executed by the sql_driver_thd thread.
> It's main purpose is to find a thread that can execute the query.
>
> === modified file 'sql/rpl_parallel.h'
> --- a/sql/rpl_parallel.h 2013-11-05 11:01:26 +0000
> +++ b/sql/rpl_parallel.h 2014-01-08 10:00:44 +0000
> @@ -117,6 +117,7 @@ struct rpl_parallel {
> void reset();
> rpl_parallel_entry *find(uint32 domain_id);
> void wait_for_done();
> + bool workers_idle();
> bool do_event(rpl_group_info *serial_rgi, Log_event *ev,
> ulonglong event_size);
> };
>
> === modified file 'sql/rpl_rli.cc'
> --- a/sql/rpl_rli.cc 2013-12-17 09:50:34 +0000
> +++ b/sql/rpl_rli.cc 2014-01-08 10:00:44 +0000
> @@ -56,7 +56,7 @@ Relay_log_info::Relay_log_info(bool is_s
> is_fake(FALSE),
> #endif
> group_master_log_pos(0), log_space_total(0), ignore_log_space_limit(0),
> - last_master_timestamp(0), slave_skip_counter(0),
> + last_master_timestamp(0), sql_thread_caught_up(true), slave_skip_counter(0),
> abort_pos_wait(0), slave_run_id(0), sql_driver_thd(),
> inited(0), abort_slave(0), slave_running(0), until_condition(UNTIL_NONE),
> until_log_pos(0), retried_trans(0), executed_entries(0),
> @@ -1287,9 +1287,14 @@ void Relay_log_info::stmt_done(my_off_t
> (probably ok - except in some very rare cases, only consequence
> is that value may take some time to display in
> Seconds_Behind_Master - not critical).
> +
> + In parallel replication, we take care to not set last_master_timestamp
> + backwards, in case of out-of-order calls here.
> */
> if (!(event_creation_time == 0 &&
> - IF_DBUG(debug_not_change_ts_if_art_event > 0, 1)))
> + IF_DBUG(debug_not_change_ts_if_art_event > 0, 1)) &&
> + !(rgi->is_parallel_exec && event_creation_time <= last_master_timestamp)
> + )
> last_master_timestamp= event_creation_time;
> }
> DBUG_VOID_RETURN;
>
> === modified file 'sql/rpl_rli.h'
> --- a/sql/rpl_rli.h 2013-11-08 14:14:18 +0000
> +++ b/sql/rpl_rli.h 2014-01-08 10:00:44 +0000
> @@ -221,6 +221,12 @@ class Relay_log_info : public Slave_repo
> bool sql_force_rotate_relay;
>
> time_t last_master_timestamp;
> + /*
> + The SQL driver thread sets this true while it is waiting at the end of the
> + relay log for more events to arrive. SHOW SLAVE STATUS uses this to report
> + Seconds_Behind_Master as zero while the SQL thread is so waiting.
> + */
> + bool sql_thread_caught_up;
>
> void clear_until_condition();
>
>
> === modified file 'sql/slave.cc'
> --- a/sql/slave.cc 2013-12-16 12:48:32 +0000
> +++ b/sql/slave.cc 2014-01-08 10:00:44 +0000
> @@ -2615,8 +2615,24 @@ static bool send_show_master_info_data(T
> if ((mi->slave_running == MYSQL_SLAVE_RUN_CONNECT) &&
> mi->rli.slave_running)
> {
> - long time_diff= ((long)(time(0) - mi->rli.last_master_timestamp)
> - - mi->clock_diff_with_master);
> + long time_diff;
> + bool idle;
> + time_t stamp= mi->rli.last_master_timestamp;
> +
> + if (!stamp)
> + idle= true;
> + else
> + {
> + idle= mi->rli.sql_thread_caught_up;
> + if (opt_slave_parallel_threads > 0 && idle &&
> + !mi->rli.parallel.workers_idle())
> + idle= false;
> + }
> + if (idle)
> + time_diff= 0;
> + else
> + {
> + time_diff= ((long)(time(0) - stamp) - mi->clock_diff_with_master);
> /*
> Apparently on some systems time_diff can be <0. Here are possible
> reasons related to MySQL:
> @@ -2632,13 +2648,15 @@ static bool send_show_master_info_data(T
> slave is 2. At SHOW SLAVE STATUS time, assume that the difference
> between timestamp of slave and rli->last_master_timestamp is 0
> (i.e. they are in the same second), then we get 0-(2-1)=-1 as a result.
> - This confuses users, so we don't go below 0: hence the max().
> + This confuses users, so we don't go below 0.
>
> last_master_timestamp == 0 (an "impossible" timestamp 1970) is a
> special marker to say "consider we have caught up".
> */
> - protocol->store((longlong)(mi->rli.last_master_timestamp ?
> - max(0, time_diff) : 0));
> + if (time_diff < 0)
> + time_diff= 0;
> + }
> + protocol->store((longlong)time_diff);
> }
> else
> {
> @@ -6096,6 +6114,7 @@ static Log_event* next_event(rpl_group_i
>
> if (hot_log)
> mysql_mutex_unlock(log_lock);
> + rli->sql_thread_caught_up= false;
> DBUG_RETURN(ev);
> }
> if (opt_reckless_slave) // For mysql-test
> @@ -6133,12 +6152,10 @@ static Log_event* next_event(rpl_group_i
> Seconds_Behind_Master would be zero only when master has no
> more updates in binlog for slave. The heartbeat can be sent
> in a (small) fraction of slave_net_timeout. Until it's done
> - rli->last_master_timestamp is temporarely (for time of
> - waiting for the following event) reset whenever EOF is
> - reached.
> + rli->sql_thread_caught_up is temporarely (for time of waiting for
> + the following event) set whenever EOF is reached.
> */
> - time_t save_timestamp= rli->last_master_timestamp;
> - rli->last_master_timestamp= 0;
> + rli->sql_thread_caught_up= true;
>
> DBUG_ASSERT(rli->relay_log.get_open_count() ==
> rli->cur_log_old_open_count);
> @@ -6264,7 +6281,7 @@ static Log_event* next_event(rpl_group_i
> rli->relay_log.wait_for_update_relay_log(rli->sql_driver_thd);
> // re-acquire data lock since we released it earlier
> mysql_mutex_lock(&rli->data_lock);
> - rli->last_master_timestamp= save_timestamp;
> + rli->sql_thread_caught_up= false;
> continue;
> }
> /*
>
> _______________________________________________
> commits mailing list
> commits@xxxxxxxxxxx
> https://lists.askmonty.org/cgi-bin/mailman/listinfo/commits