maria-discuss team mailing list archive
-
maria-discuss team
-
Mailing list archive
-
Message #03817
Re: Known limitation with TokuDB in Read Free Replication & parallel replication ?
Hello All,
I have been running sysbench oltp with a mariadb 10.1 master-slave
topology. I have not seen any replication errors when slave parallel mode
is conservative.
However, when I configure slave parallel mode to optimistic and slave
parallel threads = 2, I get a lock timeout replication error with TokuDB.
Just before the lock timeout error fires (which requires a tokudb lock
timeout to occur), I see the one of the replication threads waiting for a
lock held by the other replication thread. gdb shows the first thread
waiting on a lock inside of tokudb. the other thread is stalled when
committing the transaction in wait_for_prior_commit_2 <-
wait_for_prior_commit <- THD::wait_for_prior_commit <-
TC_LOG_MMAP::log_and_order <- ha_commit_trans.
Is TokuDB supposed to call the thd report wait for API just prior to a
thread about to wait on a tokudb lock?
On Sun, Aug 7, 2016 at 7:50 PM, jocelyn fournier <jocelyn.fournier@xxxxxxxxx
> wrote:
> Hi Kristian,
>
>
> Just FYI I confirm the "Lock wait timeout exceeded; try restarting
> transaction" behaviour you described.
>
> I've duplicated & modified the rpl_parallel_optimistic.test and run it
> into storage/tokudb/mysql-test/tokudb_rpl/t/rpl_parallel_optimistic.test :
>
> ./mtr --suite=tokudb_rpl <1:33:48
> Logging: ./mtr --suite=tokudb_rpl
> vardir: /home/joce/mariadb-10.1.16/mysql-test/var
> Checking leftover processes...
> Removing old var directory...
> Creating var directory '/home/joce/mariadb-10.1.16/mysql-test/var'...
> Checking supported features...
> MariaDB Version 10.1.16-MariaDB-debug
> - SSL connections supported
> - binaries are debug compiled
> Using suites: tokudb_rpl
> Collecting tests...
> Installing system database...
> ============================================================
> ==================
>
> TEST RESULT TIME (ms) or COMMENT
> --------------------------------------------------------------------------
>
> worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019
> worker[1] mysql-test-run: WARNING: running this script as _root_ will
> cause some tests to be skipped
> tokudb_rpl.rpl_parallel_optimistic 'innodb_plugin,mix' [ fail ]
> Test ended at 2016-08-08 01:26:34
>
> CURRENT_TEST: tokudb_rpl.rpl_parallel_optimistic
> mysqltest: In included file "./include/sync_with_master_gtid.inc":
> included from /home/joce/mariadb-10.1.16/storage/tokudb/mysql-test/tokudb_
> rpl/t/rpl_parallel_optimistic.test at line 59:
> At line 50: Failed to sync with master
>
> The result from queries just before the failure was:
> < snip >
> DELETE FROM t1 WHERE a=2;
> INSERT INTO t1 VALUES (2,5);
> DELETE FROM t1 WHERE a=3;
> INSERT INTO t1 VALUES(3,2);
> DELETE FROM t1 WHERE a=1;
> INSERT INTO t1 VALUES(1,2);
> DELETE FROM t1 WHERE a=3;
> INSERT INTO t1 VALUES(3,3);
> DELETE FROM t1 WHERE a=2;
> INSERT INTO t1 VALUES (2,6);
> include/save_master_gtid.inc
> SELECT * FROM t1 ORDER BY a;
> a b
> 1 2
> 2 6
> 3 3
> include/start_slave.inc
> include/sync_with_master_gtid.inc
> Timeout in master_gtid_wait('0-1-20', 120), current slave GTID position
> is: 0-1-3.
> Slave state : Waiting for master to send event 127.0.0.1 root 16000
> 1 master-bin.000001 3468 slave-relay-bin.000002 796
> master-bin.000001 Yes No 1205 Lock wait
> timeout exceeded; try restarting transaction 0 772 3790 None
> 0 No No 0 1205 Lock wait
> timeout exceeded; try restarting transaction 1 Slave_Pos 0-1-20
> optimistic
>
>
> I've no explanation so far for the DUPLICATE KEY error I've seen.
>
>
> Jocelyn
>
>
> Le 15/07/2016 à 17:09, Kristian Nielsen a écrit :
>
>> jocelyn fournier <jocelyn.fournier@xxxxxxxxx> writes:
>>
>> Thanks for the quick answer! I wonder if it would be possible the
>>> automatically disable the optimistic parallel replication for an
>>> engine if it does not implement it ?
>>>
>> That would probably be good - though it would be better to just implement
>> the necessary API, it's a very small change (basically TokuDB just needs
>> to
>> inform the upper layer of any lock waits that take place inside).
>>
>> However, looking more at your description, you got a "key not found"
>> error. Not implementing the thd_report_wait_for() could lead to deadlocks,
>> but it shouldn't cause key not found. In fact, in optimistic mode, all
>> errors are treated as "deadlock" errors, the query is rolled back, and
>> run again, this time not in parallel.
>>
>> So I'm wondering if there is something else going on. If transactions T1
>> and
>> T2 run in parallel, it's possible that they have a row conflict. But if T2
>> deleted a row expected by T1, I would expect T1 to wait on a row lock held
>> by T2, not get a duplicate key error. And if T1 has not yet inserted a row
>> expected by T2, then T2 would be rolled back and retried after T1 has
>> committed. The first can cause deadlock, but neither case seems to cause
>> duplicate error.
>>
>> Maybe TokuDB is doing something special with locks around replication, or
>> something else goes wrong. I guess TokuDB just hasn't been tested much
>> with
>> parallel replication.
>>
>> Does it work ok when running in conservative parallel mode?
>>
>> - Kristian.
>>
>
>
Follow ups
References