← Back to team overview

maria-discuss team mailing list archive

Re: Known limitation with TokuDB in Read Free Replication & parallel replication ?


Le 15/07/2016 à 17:27, jocelyn fournier a écrit :

Le 15/07/2016 à 17:09, Kristian Nielsen a écrit :
jocelyn fournier <jocelyn.fournier@xxxxxxxxx> writes:

Thanks for the quick answer! I wonder if it would be possible the
automatically disable the optimistic parallel replication for an
engine if it does not implement it ?
That would probably be good - though it would be better to just implement the necessary API, it's a very small change (basically TokuDB just needs to
inform the upper layer of any lock waits that take place inside).

However, looking more at your description, you got a "key not found"
error. Not implementing the thd_report_wait_for() could lead to deadlocks,
but it shouldn't cause key not found. In fact, in optimistic mode, all
errors are treated as "deadlock" errors, the query is rolled back, and
run again, this time not in parallel.

So I'm wondering if there is something else going on. If transactions T1 and T2 run in parallel, it's possible that they have a row conflict. But if T2 deleted a row expected by T1, I would expect T1 to wait on a row lock held by T2, not get a duplicate key error. And if T1 has not yet inserted a row
expected by T2, then T2 would be rolled back and retried after T1 has
committed. The first can cause deadlock, but neither case seems to cause
duplicate error.

Maybe TokuDB is doing something special with locks around replication, or something else goes wrong. I guess TokuDB just hasn't been tested much with
parallel replication.

Does it work ok when running in conservative parallel mode?
So far conservative parallel mode seems to work properly as well.
My first thought was that this issue was cause by the Read Free Replication not locking rows in the expected way, although it's advertised to be 100% compatible with parallel replication. I'll try the optimistic mode without the Read Free Replication to check if it could be related.

Well same issue without RFR :

Could not execute Delete_rows_v1 event on table sc_2.sc_product_genre; Can't find record in 'sc_product_genre', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.008420, end_log_pos 77519956

(and it didn't succeed in recovering this one, I had to skip it).

Actually it has definitly corrupted the state of the slave, I have to rebuild the replication from a backup.