← Back to team overview

maria-developers team mailing list archive

Re: [Maria-discuss] Known limitation with TokuDB in Read Free Replication & parallel replication ?

 

Hello Kristian,
I am working on a second variant of the kill query design that will only
kill the pending lock request if any for the thd being killed.  The
previous design had a problem when killing the query that triggered the
wait for call

On Aug 15, 2016 7:26 PM, "Rich Prohaska" <prohaska7@xxxxxxxxx> wrote:

Hello Kristian,
I have a prototype of the TokuFT code that will cause ALL lock waiters to
call their killed callback here: https://github.com/
prohaska7/tokuft/tree/kill_lockers

On Mon, Aug 15, 2016 at 11:51 AM, Rich Prohaska <prohaska7@xxxxxxxxx> wrote:

> Hello Kristian,
> See attached snapshot of slave threads and tokudb locks.  Thread 16 is
> waiting for a tokudb lock held by thread 16, which is waiting for a tokudb
> lock held by thread 14.  Thread 14 is waiting for a prior transaction to
> complete, presumably either thread 15 or 16.  So, we have a deadlock that
> tokudb can not detect because the ordering constraint is not available to
> tokudb.  I assume that the optimistic scheduler killed thread 16, but since
> tokudb does not implement the kill_query function, the deadlock is only
> resolved when the tokudb lock timer pops.
>
> On Mon, Aug 15, 2016 at 8:16 AM, Rich Prohaska <prohaska7@xxxxxxxxx>
> wrote:
>
>> Hello Kristian,
>> The simplest kill_query implementation for tokudb would just signal all
>> of the pending lock request's condition variables.  This would cause the
>> killed callback to be called.  A performance refinement, if necessary,
>> would allow thread A (executing the kill_query function) to identify and
>> signal a condition variable for a blocked thread B.
>>
>> On Mon, Aug 15, 2016 at 5:42 AM, Kristian Nielsen <
>> knielsen@xxxxxxxxxxxxxxx> wrote:
>>
>>> Rich Prohaska <prohaska7@xxxxxxxxx> writes:
>>>
>>> > tokudb lock timeouts are resolving the replication stall.
>>> unfortunately,
>>> > the tokudb lock timeout is 4 seconds, so the throughput is almost zero.
>>>
>>> Yes. Sorry for not making it clear that my proof-of-concept patch was
>>> incomplete...
>>>
>>> >> > I suspect that the poor slave replication performance for optimistic
>>> >> > replication occurs because TokuDB does not implement the kill_query
>>> >> > handlerton function.  kill_handlerton gets called to resolve lock
>>> wait
>>>
>>> >> Possibly, but I'm not sure it's that important. The kill will be
>>> effective
>>> >> as soon as the wait is over.
>>>
>>> No, you're absolutely right, after testing (and thinking) some more, I
>>> realise that indeed the kill_query functionality is important.
>>>
>>> A possible scenario is, given transactions T1, T2, and T3 in that order:
>>>
>>> T3 acquires a lock on row R3, T2 similarly acquires R2.
>>> Now T3 tries to acquire R2, but has to wait for T2 to release it.
>>> Later T1 tries to acquire R3, also has to wait.
>>>
>>> At this point, we kill T3, since it is holding a lock (R3) needed by an
>>> earlier transaction T1. However, T3 will not notice the kill until its
>>> own
>>> wait (on R2 held by T2) times out. T2 cannot release the lock because it
>>> is
>>> waiting for T1 to commit first. So we have a deadlock :-/
>>>
>>> With InnoDB, the kill causes T3 to wake up immediately and roll back, so
>>> that T1 can proceed without much delay.
>>>
>>> Ok, so something more is needed here. I see there is a killed_callback()
>>> which seems to check for the kill, so I'm hoping that can be used with a
>>> suitable wakeup of the offending lock_request (or all requests,
>>> perhaps). But as I'm completely new to TokuDB, I still need some more
>>> time
>>> to read the code and try to understand how everything fits together...
>>>
>>> > TokuFT implements pessimistic locking and 2 phase locking algorithms.
>>> This
>>> > wiki describes locking and concurrency in a little more detail:
>>> > https://github.com/percona/tokudb-engine/wiki/Transactions-a
>>> nd-Concurrency.
>>>
>>> Thanks, this was quite helpful.
>>>
>>> > Yes, I think they are false positives since the thd_report_wait_for
>>> API is
>>> > called but it does NOT call the THD::awake function.
>>>
>>> Ah. Then it's probably normal, caused by the group-commit optimisation.
>>> In
>>> conservative mode, if two transactions T1 and T2 did not group commit on
>>> the
>>> master, then cannot be started in parallel on the slave. But T2 can
>>> start as
>>> soon as T1 has reached COMMIT. Thus, if T2 happens to conflict with T1,
>>> there is a small window where T2 can need to wait on T1 until T1 has
>>> completed its commit.
>>>
>>> Thanks,
>>>
>>>  - Kristian.
>>>
>>
>>
>

Follow ups

References