maria-developers team mailing list archive

Thread
Date

Re: TRUE LOCK=NONE for slave

To: Sachin Setiya <sachin.setiya@xxxxxxxxxxx>
From: Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
Date: Thu, 29 Dec 2016 11:10:43 +0100
Cc: MariaDB Developers <maria-developers@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <CAA4mGETZJh_Zm9rFE5bFod6Ap5QrttxWeYAFZ7K8sAdnoeu03A@mail.gmail.com> (Sachin Setiya's message of "Thu, 29 Dec 2016 15:08:02 +0530")
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4 (gnu/linux)

Sachin Setiya <sachin.setiya@xxxxxxxxxxx> writes:

> I am thinking of implementing true lock=none for slaves. Currently

Can you explain what you mean by "true LOCK=NONE"? It is not clear from your
description. I think you mean something that will allow to run an ALTER
TABLE on the slave in parallel with the same statement on the master?

> Alter command is only executed on slave when master commits it.
> I(thanks marko for suggesting this idea) am thinking of  writing
> BEGIN_DDL_EVENT in binlog after master hits ha_prepare_inplace_alter_table.
> Then master will write QUERY_EVENT on binlog with actual alter query .
> On commit/rollback master will write COMMIT_DDL_EVENT/ROLLBACK_DDL_EVENT.

There will be a number of things to consider with this. Previously, all
transactions in the binlog have been atomic; this introduces related events
that can be arbitrarily separated in the binlog.

For example, suppose the slave stops in the middle between BEGIN_DDL_EVENT
and COMMIT_DDL_EVENT/ROLLBACK_DDL_EVENT. What is then the slave's binlog
position / GTID position?

You should probably make a list of all relevant issues, along with possible
solutions.

> On slave there will be pool of threads(configurable global variable), which

Hopefully, the exiting pool of slave worker threads (currently used for
parallel replication) can be used for this as well? Doesn't seem right to
introduce a new kind of thread pool...

> will apply these DDLs. On receiving BEGIN_DDL_EVENT slave thread will pass the
> QUERY_EVENT to one of the worker thread. Worker thread will execute until
> ha_inplace_alter_table. Actual commit_inplace_alter will be called by
> sql thread.

Won't you need some mechanism to prevent following events to access the
table before ALTER TABLE in a worker thread can acquire the metadata lock?

There are some nasty issues to be aware of with potential deadlocks related
to FLUSH TABLES WITH GLOBAL READ LOCK and such, when multiple threads are
trying to coordinate waiting and locking; there were already some quite hard
issues with this for parallel replication.

> If sql thread recieve some kind of rollback event , then it will somehow signal
> worker thread to stop executing alter. If none of the worker threads
> are available
> then event will be en-queued, then If we received rollback event the
> we will simply
> discard event from queue, If we received commit event then SQL thread
> will synchronously
> process DDL event.

Another mechanism that currently exists for improving replication speed of
ALTER TABLE (and other long-running queries) is out-of-order parallel
replication, using gtid_domain_id. This allows a DBA to run the ALTER TABLE
in a separate GTID domain, which will allow the ALTER to run on the slave in
parallel with any other queries. But unlike your suggestion, it does not
allow the ALTER on the slave to run in parallel with the one on the master,
and it is not fully automatic; the DBA needs to SET SESSION gtid_domain_id
and ensure that no conflicting transactions are run for the duration of the
ALTER.

Hope this helps,

 - Kristian.

References

TRUE LOCK=NONE for slave
From: Sachin Setiya, 2016-12-29