maria-developers team mailing list archive

Thread
Date

Re: [Commits] Rev 4376: MDEV-6676: Speculative parallel replication in http://bazaar.launchpad.net/~maria-captains/maria/10.0

To: Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
From: Jonas Oreland <jonaso@xxxxxxxxxx>
Date: Wed, 10 Sep 2014 11:15:16 +0200
Cc: MariaDB Developers <maria-developers@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <87ppf3vo8a.fsf@frigg.knielsen-hq.org>

ack

/Jonas

On Wed, Sep 10, 2014 at 11:06 AM, Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx
> wrote:

> Jonas Oreland <jonaso@xxxxxxxxxx> writes:
>
> Hi Jonas, I actually was planning to discuss this with you, as it is based
> on
> some of the ideas you mentioned earlier on parallel replication...
>
> >>   Intermediate commit. Patch is far from complete, but this small patch
> was
> >>   nevertheless sufficient to be able to sysbench-0.4 OLTP with full
> >>   parallelisation.
> >>
> >
> > "full parallelisation" does that mean that X threads on master make slave
> > achieve k*X higher throughput ?
>
> Hm, actually, it's not related to threads on the _master_ at all. Rather,
> it
> is potentially a throughput of k*Y where Y is the number of worker threads
> on
> the _slave_, up to some limit of scalability, of course.
>
> Suppose in the binlog we have transactions T1, T2, T3, T4. With this
> patch, we
> are going to try to replicate _all_ of them in parallel (up to a maximum
> of Y).
>
> If the transactions are non-conflicting, then great, everything will work
> fine
> and we will still commit them in the correct order, so applications will
> not
> see any difference.
>
> But suppose eg. T3 modifies the same row as T1, and T3 manages to touch the
> row first. In this case, T1 will need to wait for T3. This is detected as a
> deadlock (because T3 needs to eventually wait for T1 to commit before). So
> we
> roll back T3, allowing T1 to continue, and later re-try T3.
>
> So it is safe to try to run everything in parallel, at least for
> transactional
> events that can be safely rolled back.
>
> The only catch seems to be if there are a lot of potential conflicts in the
> application load. Then we could end up with too many rollbacks, causing
> throughput to decrease rather than increase.
>
> The next step is to add some flags to the GTID event on the master, and use
> those flags to control what to run in parallel on the slave:
>
>  - If DDL or non-transactional tables are involved, set a flag to not run
> this
>    event group in parallel with those that come before or after.
>
>  - Remember on the master if a transaction had to do a lock wait on another
>    transaction; in this case it seems likely that a similar wait could be
>    needed on the slave, so do not start this transaction in parallel with
> any
>    earlier ones.
>
>  - Maybe we can have a flag for "large" transactions that modify many
> rows; we
>    could choose not to run those in parallel with earlier transactions, to
>    avoid the need for expensive rollback of lots of rows.
>
>  - Allow the user to set some @@rpl_not_parallel variable, to explicitly
>    annotate transactions that are known to be likely to conflict, and hence
>    not worth it to try to run in parallel.
>
> This should be simple to do. Later we could also think about adding checks
> on
> the slave to further control what to do in parallel, however, I have not
> thought much about this.
>
> This patch seems to have a lot of potential to finally get a good solution
> to
> the single-threaded slave problem. But testing against real-life workloads
> will be
> needed to understand how to balance the speculative parallelisation against
> avoiding excessive rollbacks.
>
>  - Kristian.
>

References

Re: [Commits] Rev 4376: MDEV-6676: Speculative parallel replication in http://bazaar.launchpad.net/~maria-captains/maria/10.0
From: Jonas Oreland, 2014-09-10
Re: [Commits] Rev 4376: MDEV-6676: Speculative parallel replication in http://bazaar.launchpad.net/~maria-captains/maria/10.0
From: Kristian Nielsen, 2014-09-10