maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #09867
Re: [Maria-discuss] Known limitation with TokuDB in Read Free Replication & parallel replication ?
Hello Kristian,
I suspect that the poor slave replication performance for optimistic
replication occurs because TokuDB does not implement the kill_query
handlerton function. kill_handlerton gets called to resolve lock wait for
situations that occur when parallel replicating a small sysbench table.
InnoDB implements kill_query while TokuDB does not implement it.
On Fri, Aug 12, 2016 at 12:47 PM, Rich Prohaska <prohaska7@xxxxxxxxx> wrote:
> Hello Kristian,
> I am running your opt2 branch with a small sysbench oltp test (1 table,
> 1000 rows, 8 threads). the good news is that the slave stalls due to lock
> timeouts are gone. the bad news is that the slave performance is suspect.
>
> when slave in conservative mode with 2 threads, the tokudb wait for
> callback is being called (i put in a "printf"), which implies a parallel
> lock conflict. I assumed that conservative mode implies parallel execution
> of transactions that were group committed together, which I assumed would
> imply that these transactions were conflict free. Obviously not the case.
>
> when slave in optimistic mode with 8 threads, i see very high slave query
> execution times in processlist.
>
> | Id | User | Host | db | Command | Time | State
> | Info | Progress |
> +----+-------------+-----------+------+---------+------+----
> -------------------------------------------+------------------+----------+
> | 6 | root | localhost | NULL | Query | 0 | init
> | show processlist | 0.000 |
> | 16 | system user | | NULL | Connect | 383 | Waiting for
> master to send event | NULL | 0.000 |
> | 17 | system user | | NULL | Connect | 7 | Waiting for prior
> transaction to commit | NULL | 0.000 |
> | 18 | system user | | NULL | Connect | 3 | Waiting for prior
> transaction to commit | NULL | 0.000 |
> | 19 | system user | | NULL | Connect | 3 | Waiting for prior
> transaction to commit | NULL | 0.000 |
> | 20 | system user | | NULL | Connect | 3 |
> Delete_rows_log_event::find_row(-1) | NULL |
> 0.000 |
> | 21 | system user | | NULL | Connect | 3 | Waiting for prior
> transaction to commit | NULL | 0.000 |
> | 22 | system user | | NULL | Connect | 3 | Waiting for prior
> transaction to commit | NULL | 0.000 |
> | 23 | system user | | NULL | Connect | 7 | Waiting for prior
> transaction to commit | NULL | 0.000 |
> | 24 | system user | | NULL | Connect | 3 | Waiting for prior
> transaction to commit | NULL | 0.000 |
> | 25 | system user | | NULL | Connect | 382 | Waiting for room
> in worker thread event queue | NULL | 0.000 |
>
> It appears that there is some MULTIPLE SECOND STALL somewhere. gdb shows
> that the threads are either
> (1) waiting in the tokudb lock manager, or
> (2) waiting in the wait_for_commit::wait_for_prior_commit2 function.
>
>
>
>
>
>
> On Fri, Aug 12, 2016 at 8:50 AM, Kristian Nielsen <
> knielsen@xxxxxxxxxxxxxxx> wrote:
>
>> [Moving the discussion to maria-developers@, hope that is ok/makes
>> sense...]
>>
>> Ok, so here is a proof-of-concept patch for this, which seems to make
>> TokuDB
>> work with optimistic parallel replication.
>>
>> The core of the patch is this line in lock_request.cc
>>
>> lock_wait_callback(callback_data, m_txnid, conflicts.get(i));
>>
>> which ends up doing this:
>>
>> thd_report_wait_for (requesting_thd, blocking_thd);
>>
>> All the rest of the patch is just getting the right information around
>> between the different parts of the code.
>>
>> I put this on top of Jocelyn Fournier's tokudb_rpl.rpl_parallel_optimi
>> stic
>> patches, and pushed it on my github:
>>
>> https://github.com/knielsen/server/tree/toku_opr2
>>
>> With this patch, the test case passes! So that's promising.
>>
>> Some things still left to do for this to be a good patch:
>>
>> - I think the callback needs to trigger also for an already waiting
>> transaction, in case another transaction arrives later to contend for
>> the
>> same lock, but happens to get the lock earlier. I can look into this.
>>
>> - This patch needs linear time (in number of active transactions) per
>> callback to find the THD from the TXNID, maybe that could be optimised.
>>
>> - Probably the new callback etc. needs some cleanup to better match
>> TokuDB
>> code organisation and style.
>>
>> - And testing, of course. I'll definitely need some help there, as I'm
>> not
>> familiar with how to run TokuDB efficiently.
>>
>> Any thoughts or comments?
>>
>> - Kristian.
>>
>>
>
Follow ups
References