← Back to team overview

maria-developers team mailing list archive

Re: [Maria-discuss] Known limitation with TokuDB in Read Free Replication & parallel replication ?

 

Rich Prohaska <prohaska7@xxxxxxxxx> writes:

> On Tue, Aug 23, 2016 at 1:45 PM, Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
> wrote:

>> In the original parallel replication patch, thd_report_wait_for() did not
>> call back directly into tokudb_kill_query(). The kill happened
>> asynchronously, in a separate background thread. Then there is no problem
>> with TokuDB (or InnoDB) holding locks over the call to
>> thd_report_wait_for().
>>
>> I am considering re-introducing that orginal code - this might simplify
>> TokuDB implementation (and would also simplify InnoDB implementation).
>> I was never very happy about the way thd_report_wait_for() works currently.

Ok, so I implemented this, available in this branch:

  https://github.com/knielsen/server/commits/toku_opr3

With this code, I am no longer able to reproduce any hangs with the tests
I've been running so far.

> BTW, the current_lock_tree_mutex logic is broken.  The underlying tokuft
> lock manager has one manager object (and its mutex) and many lock tree
> objects (each with its own mutex).   Since the thd_report_wait_for function
> is called when holding the lock tree mutex (not the manager mutex), it can
> be called in parallel; thus there is a race on the current_lock_tree_mutex
> logic.

Yes, and this is the main problem solved by doing the kill asynchronously.
And from my tests, it looks like this was actually what was causing the
problems - crash, assertion, and hangs. (I haven't determined this
conclusively, but it seems at least plausible).

It really was always an ugly hack around the locking problem with
thd_report_wait_for() (InnoDB had a similar issue), it seems good to get
rid of it.

I still have the should_retry_lock_requests disabled in
retry_all_lock_requests() - otherwise I get hangs. I haven't investigated
this deeply yet.

I also want to go back over the entire set of patches and see what needs
cleaning up, and think about how this could go into 10.2 and possibly 10.1.

 - Kristian.


References