maria-developers team mailing list archive
Mailing list archive
Re: [Maria-discuss] Known limitation with TokuDB in Read Free Replication & parallel replication ?
Rich Prohaska <prohaska7@xxxxxxxxx> writes:
> On Tue, Aug 23, 2016 at 1:45 PM, Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
>> In the original parallel replication patch, thd_report_wait_for() did not
>> call back directly into tokudb_kill_query(). The kill happened
>> asynchronously, in a separate background thread. Then there is no problem
>> with TokuDB (or InnoDB) holding locks over the call to
>> I am considering re-introducing that orginal code - this might simplify
>> TokuDB implementation (and would also simplify InnoDB implementation).
>> I was never very happy about the way thd_report_wait_for() works currently.
Ok, so I implemented this, available in this branch:
With this code, I am no longer able to reproduce any hangs with the tests
I've been running so far.
> BTW, the current_lock_tree_mutex logic is broken. The underlying tokuft
> lock manager has one manager object (and its mutex) and many lock tree
> objects (each with its own mutex). Since the thd_report_wait_for function
> is called when holding the lock tree mutex (not the manager mutex), it can
> be called in parallel; thus there is a race on the current_lock_tree_mutex
Yes, and this is the main problem solved by doing the kill asynchronously.
And from my tests, it looks like this was actually what was causing the
problems - crash, assertion, and hangs. (I haven't determined this
conclusively, but it seems at least plausible).
It really was always an ugly hack around the locking problem with
thd_report_wait_for() (InnoDB had a similar issue), it seems good to get
rid of it.
I still have the should_retry_lock_requests disabled in
retry_all_lock_requests() - otherwise I get hangs. I haven't investigated
this deeply yet.
I also want to go back over the entire set of patches and see what needs
cleaning up, and think about how this could go into 10.2 and possibly 10.1.