maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #09547
Problem that some variables can be changed without stopping the slave IO thread
Monty,
This patch of yours introduced a change where before both slave threads must
be stopped to change @@replicate_events_marked_for_skip, but after only the
SQL thread is checked:
-----------------------------------------------------------------------
commit 572560f38c248d5020f0e63aeb3f8905cd568208
Author: Michael Widenius <monty@xxxxxxxxxxxx>
Date: Wed Oct 3 01:44:54 2012 +0300
Changed SHOW_FUNC variabels that don't return SHOW_ARRAY to SHOW_SIMPLE_FUNC.
+ Master_info_index::give_error_if_slave_running()
+
+ @return
+ TRUE If some slave is running. An error is printed
+ FALSE No slave is running
+*/
+
+bool Master_info_index::give_error_if_slave_running()
+{
+ DBUG_ENTER("warn_if_slave_running");
+ mysql_mutex_assert_owner(&LOCK_active_mi);
+
+ for (uint i= 0; i< master_info_hash.records; ++i)
+ {
+ Master_info *mi;
+ mi= (Master_info *) my_hash_element(&master_info_hash, i);
+ if (mi->rli.slave_running != MYSQL_SLAVE_NOT_RUN)
+ {
+ my_error(ER_SLAVE_MUST_STOP, MYF(0), (int) mi->connection_name.length,
+ mi->connection_name.str);
+ DBUG_RETURN(TRUE);
+ }
+ }
+ DBUG_RETURN(FALSE);
+}
bool
Sys_var_replicate_events_marked_for_skip::global_update(THD *thd, set_var *var)
...
- /* Slave threads must be stopped to change the variable. */
+ mysql_mutex_unlock(&LOCK_global_system_variables);
mysql_mutex_lock(&LOCK_active_mi);
- lock_slave_threads(active_mi);
- init_thread_mask(&thread_mask, active_mi, 0 /*not inverse*/);
- if (thread_mask) // We refuse if any slave thread is running
- {
- my_error(ER_SLAVE_MUST_STOP, MYF(0));
- result= true;
- }
- else
+ if (!master_info_index->give_error_if_slave_running())
-----------------------------------------------------------------------
So instead of checking both IO and SQL thread, now it checks only SQL thread
(mi->rli.slave_running). Why was this change made? Was it by mistake?
Unfortunately, I did not notice this before, so I subsequently used the same
code (give_error_if_slave_running()) for the following variables:
gtid_slave_pos
slave_parallel_threads
slave_domain_parallel_threads
replicate_events_marked_for_skip
gtid_ignore_duplicates
This means that all of the above variables can be changed while the IO
thread is still running. This was never the intention! There are some
complex interactions between the master, the IO thread, and the SQL thread,
particularly with respect to the slave's GTID position, and parts of the
code are specifically written on the assumption that certain things cannot
change without first stopping all replication threads. (I do not have the
full overview of exactly what can break, I think that would require a
careful review of large parts of the code, but I fear that it can corrupt
not just the replication GTID position and possibly even internal server
structures).
So I am wondering what to do ... it is easy to fix
give_error_if_slave_running() to check for both threads, but this is a
change in behaviour (even if it should match documentation) that could
easily break administration scripts, eg. after a 10.0 automatic security-fix
upgrade. I also know that eg. Jean François has written about how he uses
the feature of being able to not stop the IO thread in some cases.
On the other hand, this can cause serious problems for users, as they have
no real way of knowing not to attempt to change these variables without
stopping IO thread first, when there is no error given. MDEV-9138 describes
a complex scenario where slave position gets corrupted from this.
Any suggestions?
Thanks,
- Kristian.