maria-developers team mailing list archive
Mailing list archive
Re: Syntax for parallel replication
On Oct 06, Kristian Nielsen wrote:
> - Parallel replication is still a somewhat experimental feature, so
> it seems too risky to enable it by default. Also, it doesn't really
> seem possible for the server to automatically set the best number of
> threads to use, with current implementation (or possibly any
Increase parallelization when replication just works, and penalize it
when retries happen? With an upper limit similar to (or derived from)
innodb-concurrency-tickets. Just a thought.
> - When replicating with non-transactional updates, or in non-gtid
> mode, slave state is not crash safe. This is true in non-parallel
> replication also, but in parallel replication, the problem seems
> amplified, as there may be multiple transactions in progress at the
> time of a crash, complicating possible manual recovery. This also
> suggests that parallel replication must be configurable.
Hm. From reading the MDEV, I've got an idea that you won't replicate
non-transactional updates concurrently (as they cannot be rolled back,
so your base assumption doesn't work). Was it wrong - will you replicate
non-transactional updates concurrently?
> - When using domain-based parallel replication, the user is
> responsible for ensuring that independent domains are non-conflicting
> and can be replicated out-of-order wrt. each other. So if replication
> domains are used, but this property is not guaranteed, then
> domain-based parallel replication need to be configurable, or
> parallel replication cannot be used at all.
As you like.
I'd simply say that in domain-based parallel replication, the user is
responsible for domain independence. If he has misconfigured domains,
the server is not at fault, and we should not bother covering this use
> - The new speculative replication feature in MDEV-6676 is not always
> guaranteed to be a win - in some workloads, where there are many
> conflicts between successive transactions, excessive rollback could
> cause it to be less efficient than not using it. Again, this suggests
> it needs to be configurable.
Though if the concurrency will be auto-tuned as I mentioned above, it'll
auto-disable itself in this case. With no user intervention.
> So given this, I came up with the following idea for syntax:
> CHANGE MASTER TO PARALLEL_MODE=(domain,groupcommit,transactional,waiting)
> Each of the four keywords in the parenthesis is optional.
> "domain" enables domain-based parallelisation, where each replication domain
> is treated independently.
> "groupcommit" enables the non-speculative mode, where only transactions that
> group-committed together on the master are applied in parallel on the slave.
> "transactional" enables the speculative mode, where all transactional DML is
> optimistically tried in parallel, and then in case of conflict a rollback and
> retry is done.
> "groupcommit" and "transactional" are mutually exclusive, at most one of them
> can be specified.
Assorted thoughts in no specific order:
1. I'd rename "groupcommit" to something less technical, like "master",
or "following_master", or "following", (or whatever)
2. How does it work with multi-source? The usual "CHANGE MASTER name TO" ?
3. How to specify the degree of parallelization - the number of threads?
Still --slave-parallel-threads=N ? You syntax doesn't seem to cover
4. Command line? None? CHANGE MASTER specifies replication coordinates,
and they change on every restart, that's why there's no command-line
option for them. They're stored in master-info.
But your "TO PARALLEL_MODE" only configures how to apply events,
seems like something that should rather be in the my.cnf.
In the view of 4) above, did you consider using system variables? Like
and, the usual, --connection_name.slave-parallel-mode=... for
multi-source. This variable can be of SET or FLAGSET type, so it could
be set to a combination of values.