maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #09862
Re: [Maria-discuss] Known limitation with TokuDB in Read Free Replication & parallel replication ?
Hello Kristian,
I am running your opt2 branch with a small sysbench oltp test (1 table,
1000 rows, 8 threads). the good news is that the slave stalls due to lock
timeouts are gone. the bad news is that the slave performance is suspect.
when slave in conservative mode with 2 threads, the tokudb wait for
callback is being called (i put in a "printf"), which implies a parallel
lock conflict. I assumed that conservative mode implies parallel execution
of transactions that were group committed together, which I assumed would
imply that these transactions were conflict free. Obviously not the case.
when slave in optimistic mode with 8 threads, i see very high slave query
execution times in processlist.
| Id | User | Host | db | Command | Time | State
| Info | Progress |
+----+-------------+-----------+------+---------+------+-----------------------------------------------+------------------+----------+
| 6 | root | localhost | NULL | Query | 0 | init
| show processlist | 0.000 |
| 16 | system user | | NULL | Connect | 383 | Waiting for master
to send event | NULL | 0.000 |
| 17 | system user | | NULL | Connect | 7 | Waiting for prior
transaction to commit | NULL | 0.000 |
| 18 | system user | | NULL | Connect | 3 | Waiting for prior
transaction to commit | NULL | 0.000 |
| 19 | system user | | NULL | Connect | 3 | Waiting for prior
transaction to commit | NULL | 0.000 |
| 20 | system user | | NULL | Connect | 3 |
Delete_rows_log_event::find_row(-1) | NULL | 0.000
|
| 21 | system user | | NULL | Connect | 3 | Waiting for prior
transaction to commit | NULL | 0.000 |
| 22 | system user | | NULL | Connect | 3 | Waiting for prior
transaction to commit | NULL | 0.000 |
| 23 | system user | | NULL | Connect | 7 | Waiting for prior
transaction to commit | NULL | 0.000 |
| 24 | system user | | NULL | Connect | 3 | Waiting for prior
transaction to commit | NULL | 0.000 |
| 25 | system user | | NULL | Connect | 382 | Waiting for room
in worker thread event queue | NULL | 0.000 |
It appears that there is some MULTIPLE SECOND STALL somewhere. gdb shows
that the threads are either
(1) waiting in the tokudb lock manager, or
(2) waiting in the wait_for_commit::wait_for_prior_commit2 function.
On Fri, Aug 12, 2016 at 8:50 AM, Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
wrote:
> [Moving the discussion to maria-developers@, hope that is ok/makes
> sense...]
>
> Ok, so here is a proof-of-concept patch for this, which seems to make
> TokuDB
> work with optimistic parallel replication.
>
> The core of the patch is this line in lock_request.cc
>
> lock_wait_callback(callback_data, m_txnid, conflicts.get(i));
>
> which ends up doing this:
>
> thd_report_wait_for (requesting_thd, blocking_thd);
>
> All the rest of the patch is just getting the right information around
> between the different parts of the code.
>
> I put this on top of Jocelyn Fournier's tokudb_rpl.rpl_parallel_optimistic
> patches, and pushed it on my github:
>
> https://github.com/knielsen/server/tree/toku_opr2
>
> With this patch, the test case passes! So that's promising.
>
> Some things still left to do for this to be a good patch:
>
> - I think the callback needs to trigger also for an already waiting
> transaction, in case another transaction arrives later to contend for
> the
> same lock, but happens to get the lock earlier. I can look into this.
>
> - This patch needs linear time (in number of active transactions) per
> callback to find the THD from the TXNID, maybe that could be optimised.
>
> - Probably the new callback etc. needs some cleanup to better match TokuDB
> code organisation and style.
>
> - And testing, of course. I'll definitely need some help there, as I'm not
> familiar with how to run TokuDB efficiently.
>
> Any thoughts or comments?
>
> - Kristian.
>
>
Follow ups
References