maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #05919
some bugs in dingqing parallel replication
Hi,guys
I have worked on this branch https://code.launchpad.net/~knielsen/maria/dingqi-parallel-replication for some days, and found bugs listed below.May this would be helpful to you.
1, when slave switch on table filter,this bug could lead server crash.
how to reappear:
on slave
set replicate-wild-ignore-table = test.t5 in config file
on master do these operations
CREATE TABLE test.t3 (a INT AUTO_INCREMENT PRIMARY KEY, b DECIMAL(20,20), c INT);
SET INSERT_ID=1;
SET @c=2;
SET @@rand_seed1=10000000, @@rand_seed2=1000000;
INSERT INTO t3 VALUES (NULL, RAND(), @c);
codes lead this bug:
In execute_single_transaction()
case RAND_EVENT:
need_remove_from_trans= true;
if(!rli->is_deferred_event(ev))
delete ev;
break;
reason:
Rand Event object is deleted in execute_single_transaction(),
but it's pointer would be used is slave_execute_deferred_events() later.
2, SQL thread could read and apply some log events repeated.
how to reappear:
it's a little hard to reappear. if you set max_relay_log_size=100M and keep SQL thread closed to IO thread, this bug may reappear.
codes lead this bug:
In reopen_relay_log()
rli->event_relay_log_pos= max(rli->event_relay_log_pos, BIN_LOG_HEADER_SIZE);
my_b_seek(cur_log,rli->event_relay_log_pos);
reason:
when SQL thread use a hot log,but the hot log was closed by IO thread just recently, SQL thread need to reopen this log and set read offset to rli->event_relay_log_pos, while rli->event_relay_log_pos could be set new value in other thread for there are many threads apply log events.so rli->event_relay_log_pos could be less then rli->future_event_relay_log_pos.
3, SQL thread do not report error information in result of "show slave status"and replication do not stop, when the slave insert duplicate record into a table with primary key.
how to reappear:
Just need to change master_log_pos to read duplicate records from master.
codes lead this bug:
In execute_single_transaction()
retry_transaction:
ev= trans->event_list_head;
... ...
if (ret && rli->trans_retries < slave_trans_retries)
{ ...
goto retry_transaction;
}
reason:
as I have sayed in other email: Rows_log_event::do_apply_event() do twice but return different results for m_curr_row==m_rows_end in the second time.
4, when do oparetions such as "show slave status" and "stop slave", it could be blocked for a long time.
how to reappear:
just do "show slave status" again and again.
codes lead this bug:
In the queue_event()
case FORMAT_DESCRIPTION_EVENT:
...
wait_for_all_dml_done(&mi->rli, true);
and in process_io_rotate()
wait_for_all_dml_done(&mi->rli, true);
reason:
IO thread could wait in wait_for_all_dml_done() while holding the rpl_mi->data_lock, so operations like "show slave status" could be blocked for waiting rpl_mi->data_lock.
5, "START SLAVE UNTIL" make replication stop in different place.
how to reappear:
suppose log events in relay log like:
BEGIN; ------->pos1
LOG_EVENT1;
LOG_EVENT2;
COMMIT; ------>pos2
BEGIN; ------>pos3
LOG_EVENT3; --->stop_pos
LOG_EVENT4;
COMMIT; ------>pos4
If we do START SLAVE UNTIL relay_log_pos=stop_pos; The replication should stop at pos4 but it stop pos2.
6, log_event->thd is wrong.
suppose log_event was read is thread_1 so the log_event->thd==thread_1, but this log_event may be dispatch to other thread (suppose thread_2).the log_event is applyed in thread_2 but the log_event->thd==thread_1.this problem can make log event apply failed in MySQL, but in mariaDB it seems ok.
2013-07-19
nanyi607rao
Follow ups