← Back to team overview

maria-developers team mailing list archive

Re: [Maria-discuss] Known limitation with TokuDB in Read Free Replication & parallel replication ?

 

Hello Kristian,
I am running your opt2 branch with a small sysbench oltp test (1 table,
1000 rows, 8 threads).  the good news is that the slave stalls due to lock
timeouts are gone.  the bad news is that the slave performance is suspect.

when slave in conservative mode with 2 threads, the tokudb wait for
callback is being called (i put in a "printf"), which implies a parallel
lock conflict.  I assumed that conservative mode implies parallel execution
of transactions that were group committed together, which I assumed would
imply that these transactions were conflict free.  Obviously not the case.

when slave in optimistic mode with 8 threads, i see very high slave query
execution times in processlist.

| Id | User        | Host      | db   | Command | Time | State
                            | Info             | Progress |
+----+-------------+-----------+------+---------+------+-----------------------------------------------+------------------+----------+
|  6 | root        | localhost | NULL | Query   |    0 | init
                           | show processlist |    0.000 |
| 16 | system user |           | NULL | Connect |  383 | Waiting for master
to send event              | NULL             |    0.000 |
| 17 | system user |           | NULL | Connect |    7 | Waiting for prior
transaction to commit       | NULL             |    0.000 |
| 18 | system user |           | NULL | Connect |    3 | Waiting for prior
transaction to commit       | NULL             |    0.000 |
| 19 | system user |           | NULL | Connect |    3 | Waiting for prior
transaction to commit       | NULL             |    0.000 |
| 20 | system user |           | NULL | Connect |    3 |
Delete_rows_log_event::find_row(-1)           | NULL             |    0.000
|
| 21 | system user |           | NULL | Connect |    3 | Waiting for prior
transaction to commit       | NULL             |    0.000 |
| 22 | system user |           | NULL | Connect |    3 | Waiting for prior
transaction to commit       | NULL             |    0.000 |
| 23 | system user |           | NULL | Connect |    7 | Waiting for prior
transaction to commit       | NULL             |    0.000 |
| 24 | system user |           | NULL | Connect |    3 | Waiting for prior
transaction to commit       | NULL             |    0.000 |
| 25 | system user |           | NULL | Connect |  382 | Waiting for room
in worker thread event queue | NULL             |    0.000 |

It appears that there is some MULTIPLE SECOND STALL somewhere.  gdb shows
that the threads are either
(1) waiting in the tokudb lock manager, or
(2) waiting in the wait_for_commit::wait_for_prior_commit2 function.






On Fri, Aug 12, 2016 at 8:50 AM, Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
wrote:

> [Moving the discussion to maria-developers@, hope that is ok/makes
> sense...]
>
> Ok, so here is a proof-of-concept patch for this, which seems to make
> TokuDB
> work with optimistic parallel replication.
>
> The core of the patch is this line in lock_request.cc
>
>     lock_wait_callback(callback_data, m_txnid, conflicts.get(i));
>
> which ends up doing this:
>
>     thd_report_wait_for (requesting_thd, blocking_thd);
>
> All the rest of the patch is just getting the right information around
> between the different parts of the code.
>
> I put this on top of Jocelyn Fournier's tokudb_rpl.rpl_parallel_optimistic
> patches, and pushed it on my github:
>
>   https://github.com/knielsen/server/tree/toku_opr2
>
> With this patch, the test case passes! So that's promising.
>
> Some things still left to do for this to be a good patch:
>
>  - I think the callback needs to trigger also for an already waiting
>    transaction, in case another transaction arrives later to contend for
> the
>    same lock, but happens to get the lock earlier. I can look into this.
>
>  - This patch needs linear time (in number of active transactions) per
>    callback to find the THD from the TXNID, maybe that could be optimised.
>
>  - Probably the new callback etc. needs some cleanup to better match TokuDB
>    code organisation and style.
>
>  - And testing, of course. I'll definitely need some help there, as I'm not
>    familiar with how to run TokuDB efficiently.
>
> Any thoughts or comments?
>
>  - Kristian.
>
>

Follow ups

References