maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #09888
Re: Patch for parallel replication DBUG_ASSERT check on DROP TABLE et al
Monty,
Apparently you pushed this patch into 10.0, even though I explained that it
is incorrect, and why. That's not cool, and you can even see it failing in
Buildbot now.
Can you please fix it ASAP?
- Kristian.
Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx> writes:
> Monty writes:
>
>> Can you please check if this looks correct?
>
>> diff --git a/sql/mdl.cc b/sql/mdl.cc
>> index 28d2006..3b1f9f2 100644
>> --- a/sql/mdl.cc
>> +++ b/sql/mdl.cc
>> @@ -17,6 +17,7 @@
>> #include "sql_class.h"
>> #include "debug_sync.h"
>> #include "sql_array.h"
>> +#include "rpl_rli.h"
>> #include <hash.h>
>> #include <mysqld_error.h>
>> #include <mysql/plugin.h>
>> @@ -442,6 +443,7 @@ class MDL_lock
>> virtual void notify_conflicting_locks(MDL_context *ctx) = 0;
>>
>> virtual bitmap_t hog_lock_types_bitmap() const = 0;
>> + bool check_if_conflicting_replication_locks(MDL_context *ctx);
>
> This should probably be #ifdef NDEBUG, since you're only using it in debug
> build?
>
>>
>> /** List of granted tickets for this lock. */
>> Ticket_list m_granted;
>> @@ -2290,6 +2292,44 @@ void MDL_scoped_lock::notify_conflicting_locks(MDL_context *ctx)
>> }
>> }
>>
>> +/**
>> + Check if there is any conflicting lock that could cause this thread
>> + to wait for another thread which is not ready to commit.
>> + This is always an error, as the upper level of parallel replication
>> + should not allow a scheduling of a conflicting DDL until all earlier
>> + transactions has commited.
>> +
>> + This function is only called for a slave using parallel replication
>> + and trying to get an exclusive lock for the table.
>> +*/
>> +
>> +bool MDL_lock::check_if_conflicting_replication_locks(MDL_context *ctx)
>> +{
>> + Ticket_iterator it(m_granted);
>> + MDL_ticket *conflicting_ticket;
>> +
>> + while ((conflicting_ticket= it++))
>> + {
>> + if (conflicting_ticket->get_ctx() != ctx)
>> + {
>> + MDL_context *conflicting_ctx= conflicting_ticket->get_ctx();
>> +
>> + /*
>> + If the conflicting thread is another parallel replication
>> + thread for the same master and it's not in commit stage, then
>> + the current transaction has started too early and something is
>> + seriously wrong.
>> + */
>> + if (conflicting_ctx->get_thd()->rgi_slave &&
>> + conflicting_ctx->get_thd()->rgi_slave->rli ==
>> + ctx->get_thd()->rgi_slave->rli &&
>> + !conflicting_ctx->get_thd()->rgi_slave->did_mark_start_commit)
>> + return 1; // Fatal error
>
> No, this is not correct. For example, the threads might be in different
> domains, or using different multi-source connections. You need a check like
> in thd_report_wait_for(). _If_ T1 and T2 are in the same parallel
> replication domain, _and_ T1 goes before T2, _and_ there is a DDL lock
> conflict but T1 has not started to commit, then it indicates a potential
> problem.
>
> Also, the comments are too vague, "Something is seriously wrong" is not a
> good description. What is wrong? Please describe the issue being checked
> more precisely. What is the role of each of the two threads involved? What
> is the situation being checked? Why is it wrong?
>
> Also, use some temporaries for conflicting_ctx->get_thd()->rgi_slave etc, so
> the condition becomes more readable.
>
>> + }
>> + }
>> + return 0;
>> +}
>> +
>>
>> /**
>> Acquire one lock with waiting for conflicting locks to go away if needed.
>> @@ -2355,6 +2395,11 @@ MDL_context::acquire_lock(MDL_request *mdl_request, ulong lock_wait_timeout)
>> if (lock->needs_notification(ticket) && lock_wait_timeout)
>> lock->notify_conflicting_locks(this);
>>
>> + DBUG_ASSERT(mdl_request->type != MDL_INTENTION_EXCLUSIVE ||
>> + !(get_thd()->rgi_slave &&
>> + get_thd()->rgi_slave->is_parallel_exec &&
>> + lock->check_if_conflicting_replication_locks(this)));
>> +
>> mysql_prlock_unlock(&lock->m_rwlock);
>>
>> will_wait_for(ticket);
>> diff --git a/sql/mdl.h b/sql/mdl.h
>> index c4d792a..13de602 100644
>> --- a/sql/mdl.h
>> +++ b/sql/mdl.h
>> @@ -910,7 +910,6 @@ class MDL_context
>> */
>> MDL_wait_for_subgraph *m_waiting_for;
>> private:
>> - THD *get_thd() const { return m_owner->get_thd(); }
>> MDL_ticket *find_ticket(MDL_request *mdl_req,
>> enum_mdl_duration *duration);
>> void release_locks_stored_before(enum_mdl_duration duration, MDL_ticket *sentinel);
>> @@ -919,6 +918,7 @@ class MDL_context
>> MDL_ticket **out_ticket);
>>
>> public:
>> + THD *get_thd() const { return m_owner->get_thd(); }
>> void find_deadlock();
>>
>> ulong get_thread_id() const { return thd_get_thread_id(get_thd()); }
>
> - Kristian.
>
> _______________________________________________
> Mailing list: https://launchpad.net/~maria-developers
> Post to : maria-developers@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~maria-developers
> More help : https://help.launchpad.net/ListHelp
Follow ups
References