maria-developers team mailing list archive

Thread
Date

Re: [Commits] Rev 4281: MDEV-5262, MDEV-5914, MDEV-5941, MDEV-6020: Deadlocks during parallel replication causing replication to fail. in http://bazaar.launchpad.net/~maria-captains/maria/10.0

To: Jan Lindström <jan.lindstrom@xxxxxxxxxx>
From: Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
Date: Wed, 09 Jul 2014 11:21:30 +0200
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx
In-reply-to: <CABFC8Dnxp2vdo_NN5=nSS7zhKi+ZVcn5hh--wr9yGy1AF-Ybdg@mail.gmail.com> ("Jan Lindström"'s message of "Tue, 8 Jul 2014 20:34:50 +0300")
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.4 (gnu/linux)

Jan Lindström <jan.lindstrom@xxxxxxxxxx> writes:

> I have mostly InnoDB style comments, but some questions below:

Thanks for comments!

Sorry about style violations, and thanks for pointing then out. I'll try to
fix all of it.

Answers to other questions below:

> Why you need to define these, why #include "mysql/plugin.h" file is not
> enough ?
>
> +extern "C" void thd_report_wait_for(const MYSQL_THD thd, MYSQL_THD
> other_thd);
> +extern "C" int thd_need_wait_for(const MYSQL_THD thd);
> +extern "C"
> +int thd_need_ordering_with(const MYSQL_THD thd, const MYSQL_THD other_thd);

See the discussions here:

    https://lists.launchpad.net/maria-developers/msg07494.html

The conclusion with Serg was that doing a proper public API in 10.0 was too
big a change. We will do it in 10.1 (MDEV-6429). plugin.h should not be
modified in a GA version and with an API that has not been thought through
wrt. all engines. Also, the API should be a service, not in plugin.h.

> Why you need to introduce a new global structure for this, can't this be
> part
> of some already existing system structure e.g. lock_sys and why you do not
> use ut_list ?
>
> +/* Buffer to collect THDs to report waits for. */
> +struct thd_wait_reports {
> +    struct thd_wait_reports *next;
> +    ulint used;
> +    trx_t *waitees[64];
> +};

So the idea here is to collect a bunch of trx pointers during the graph
traversal, and pass each of them to thd_report_wait_for(). I wanted to pass
them after completing the graph traversal to avoid issues if
thd_report_wait_for() needs to kill one of them; this goes to
lock_cancel_waiting_and_release() eventually which I assume might modify the
graph being traversed.

This is time-critical code (it runs under global lock_sys->mutex), so it is
important to avoid memory allocations. So I use an array allocated on the
stack, this is really short-lived data, it only lives for the duration of
lock_deadlock_check_and_resolve(). In all normal cases the single array should
be sufficient, but the code needs to handle the overflow also, of course.

So in case of overflow, I malloc further arrays and link them into the
first. I suppose another solution would be to abort in case of overflow and
roll back the transaction like in case of ctx->too_deep==true, however I don't
like to introduce such arbitrary limits when it's not necessary.

I looked into using ib_list, but this seems to always do mallocs, which I want
to avoid. I also looked at ut0vec and dyn0dyn, but they also seem to always
use malloc and thus do not seem appropriate. So I did not find anything
existing I could use, unfortunately, but then I know little of the InnoDB
code.

Do you have a suggestion for an existing data structure to use that can take
an inital buffer on the stack and malloc extra memory only if needed?

Thanks for comments!

 - Kristian.