maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #07690
Re: [Commits] Rev 4376: MDEV-6676: Speculative parallel replication in http://bazaar.launchpad.net/~maria-captains/maria/10.0
ack
/Jonas
On Wed, Sep 10, 2014 at 11:06 AM, Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx
> wrote:
> Jonas Oreland <jonaso@xxxxxxxxxx> writes:
>
> Hi Jonas, I actually was planning to discuss this with you, as it is based
> on
> some of the ideas you mentioned earlier on parallel replication...
>
> >> Intermediate commit. Patch is far from complete, but this small patch
> was
> >> nevertheless sufficient to be able to sysbench-0.4 OLTP with full
> >> parallelisation.
> >>
> >
> > "full parallelisation" does that mean that X threads on master make slave
> > achieve k*X higher throughput ?
>
> Hm, actually, it's not related to threads on the _master_ at all. Rather,
> it
> is potentially a throughput of k*Y where Y is the number of worker threads
> on
> the _slave_, up to some limit of scalability, of course.
>
> Suppose in the binlog we have transactions T1, T2, T3, T4. With this
> patch, we
> are going to try to replicate _all_ of them in parallel (up to a maximum
> of Y).
>
> If the transactions are non-conflicting, then great, everything will work
> fine
> and we will still commit them in the correct order, so applications will
> not
> see any difference.
>
> But suppose eg. T3 modifies the same row as T1, and T3 manages to touch the
> row first. In this case, T1 will need to wait for T3. This is detected as a
> deadlock (because T3 needs to eventually wait for T1 to commit before). So
> we
> roll back T3, allowing T1 to continue, and later re-try T3.
>
> So it is safe to try to run everything in parallel, at least for
> transactional
> events that can be safely rolled back.
>
> The only catch seems to be if there are a lot of potential conflicts in the
> application load. Then we could end up with too many rollbacks, causing
> throughput to decrease rather than increase.
>
> The next step is to add some flags to the GTID event on the master, and use
> those flags to control what to run in parallel on the slave:
>
> - If DDL or non-transactional tables are involved, set a flag to not run
> this
> event group in parallel with those that come before or after.
>
> - Remember on the master if a transaction had to do a lock wait on another
> transaction; in this case it seems likely that a similar wait could be
> needed on the slave, so do not start this transaction in parallel with
> any
> earlier ones.
>
> - Maybe we can have a flag for "large" transactions that modify many
> rows; we
> could choose not to run those in parallel with earlier transactions, to
> avoid the need for expensive rollback of lots of rows.
>
> - Allow the user to set some @@rpl_not_parallel variable, to explicitly
> annotate transactions that are known to be likely to conflict, and hence
> not worth it to try to run in parallel.
>
> This should be simple to do. Later we could also think about adding checks
> on
> the slave to further control what to do in parallel, however, I have not
> thought much about this.
>
> This patch seems to have a lot of potential to finally get a good solution
> to
> the single-threaded slave problem. But testing against real-life workloads
> will be
> needed to understand how to balance the speculative parallelisation against
> avoiding excessive rollbacks.
>
> - Kristian.
>
References