maria-developers team mailing list archive

Thread
Date

Re: [Commits] Rev 4376: MDEV-6676: Speculative parallel replication in http://bazaar.launchpad.net/~maria-captains/maria/10.0

To: Robert Hodges <robert.hodges@xxxxxxxxxxxxxx>
From: Jonas Oreland <jonaso@xxxxxxxxxx>
Date: Wed, 10 Sep 2014 18:52:06 +0200
Cc: Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>, MariaDB Developers <maria-developers@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <CAFfcncXFMXcCnxuije9T2zyhwo9hyAyz6XATFPbc6eB4Z+2+=g@mail.gmail.com>

Why ?
Transactions are committed *in order*.

/Jonas

On Wed, Sep 10, 2014 at 4:32 PM, Robert Hodges <robert.hodges@xxxxxxxxxxxxxx
> wrote:

> Hi Kristian,
>
> There is one thing I have never understood about your parallel apply
> algorithm.  How do you handle the case where the server crashes when some
> threads have committed but others have not?  It seems as if you could have
> a problem with recovery.
>
> Cheers, Robert Hodges
>
>
> On Wed, Sep 10, 2014 at 2:06 AM, Kristian Nielsen <
> knielsen@xxxxxxxxxxxxxxx> wrote:
>
>> Jonas Oreland <jonaso@xxxxxxxxxx> writes:
>>
>> Hi Jonas, I actually was planning to discuss this with you, as it is
>> based on
>> some of the ideas you mentioned earlier on parallel replication...
>>
>> >>   Intermediate commit. Patch is far from complete, but this small
>> patch was
>> >>   nevertheless sufficient to be able to sysbench-0.4 OLTP with full
>> >>   parallelisation.
>> >>
>> >
>> > "full parallelisation" does that mean that X threads on master make
>> slave
>> > achieve k*X higher throughput ?
>>
>> Hm, actually, it's not related to threads on the _master_ at all. Rather,
>> it
>> is potentially a throughput of k*Y where Y is the number of worker
>> threads on
>> the _slave_, up to some limit of scalability, of course.
>>
>> Suppose in the binlog we have transactions T1, T2, T3, T4. With this
>> patch, we
>> are going to try to replicate _all_ of them in parallel (up to a maximum
>> of Y).
>>
>> If the transactions are non-conflicting, then great, everything will work
>> fine
>> and we will still commit them in the correct order, so applications will
>> not
>> see any difference.
>>
>> But suppose eg. T3 modifies the same row as T1, and T3 manages to touch
>> the
>> row first. In this case, T1 will need to wait for T3. This is detected as
>> a
>> deadlock (because T3 needs to eventually wait for T1 to commit before).
>> So we
>> roll back T3, allowing T1 to continue, and later re-try T3.
>>
>> So it is safe to try to run everything in parallel, at least for
>> transactional
>> events that can be safely rolled back.
>>
>> The only catch seems to be if there are a lot of potential conflicts in
>> the
>> application load. Then we could end up with too many rollbacks, causing
>> throughput to decrease rather than increase.
>>
>> The next step is to add some flags to the GTID event on the master, and
>> use
>> those flags to control what to run in parallel on the slave:
>>
>>  - If DDL or non-transactional tables are involved, set a flag to not run
>> this
>>    event group in parallel with those that come before or after.
>>
>>  - Remember on the master if a transaction had to do a lock wait on
>> another
>>    transaction; in this case it seems likely that a similar wait could be
>>    needed on the slave, so do not start this transaction in parallel with
>> any
>>    earlier ones.
>>
>>  - Maybe we can have a flag for "large" transactions that modify many
>> rows; we
>>    could choose not to run those in parallel with earlier transactions, to
>>    avoid the need for expensive rollback of lots of rows.
>>
>>  - Allow the user to set some @@rpl_not_parallel variable, to explicitly
>>    annotate transactions that are known to be likely to conflict, and
>> hence
>>    not worth it to try to run in parallel.
>>
>> This should be simple to do. Later we could also think about adding
>> checks on
>> the slave to further control what to do in parallel, however, I have not
>> thought much about this.
>>
>> This patch seems to have a lot of potential to finally get a good
>> solution to
>> the single-threaded slave problem. But testing against real-life
>> workloads will be
>> needed to understand how to balance the speculative parallelisation
>> against
>> avoiding excessive rollbacks.
>>
>>  - Kristian.
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~maria-developers
>> Post to     : maria-developers@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~maria-developers
>> More help   : https://help.launchpad.net/ListHelp
>>
>
>

References

Re: [Commits] Rev 4376: MDEV-6676: Speculative parallel replication in http://bazaar.launchpad.net/~maria-captains/maria/10.0
From: Jonas Oreland, 2014-09-10
Re: [Commits] Rev 4376: MDEV-6676: Speculative parallel replication in http://bazaar.launchpad.net/~maria-captains/maria/10.0
From: Kristian Nielsen, 2014-09-10
Re: [Commits] Rev 4376: MDEV-6676: Speculative parallel replication in http://bazaar.launchpad.net/~maria-captains/maria/10.0
From: Robert Hodges, 2014-09-10