maria-developers team mailing list archive
Mailing list archive
Re: [Commits] Rev 4376: MDEV-6676: Speculative parallel replication in http://bazaar.launchpad.net/~maria-captains/maria/10.0
Transactions are committed *in order*.
On Wed, Sep 10, 2014 at 4:32 PM, Robert Hodges <robert.hodges@xxxxxxxxxxxxxx
> Hi Kristian,
> There is one thing I have never understood about your parallel apply
> algorithm. How do you handle the case where the server crashes when some
> threads have committed but others have not? It seems as if you could have
> a problem with recovery.
> Cheers, Robert Hodges
> On Wed, Sep 10, 2014 at 2:06 AM, Kristian Nielsen <
> knielsen@xxxxxxxxxxxxxxx> wrote:
>> Jonas Oreland <jonaso@xxxxxxxxxx> writes:
>> Hi Jonas, I actually was planning to discuss this with you, as it is
>> based on
>> some of the ideas you mentioned earlier on parallel replication...
>> >> Intermediate commit. Patch is far from complete, but this small
>> patch was
>> >> nevertheless sufficient to be able to sysbench-0.4 OLTP with full
>> >> parallelisation.
>> > "full parallelisation" does that mean that X threads on master make
>> > achieve k*X higher throughput ?
>> Hm, actually, it's not related to threads on the _master_ at all. Rather,
>> is potentially a throughput of k*Y where Y is the number of worker
>> threads on
>> the _slave_, up to some limit of scalability, of course.
>> Suppose in the binlog we have transactions T1, T2, T3, T4. With this
>> patch, we
>> are going to try to replicate _all_ of them in parallel (up to a maximum
>> of Y).
>> If the transactions are non-conflicting, then great, everything will work
>> and we will still commit them in the correct order, so applications will
>> see any difference.
>> But suppose eg. T3 modifies the same row as T1, and T3 manages to touch
>> row first. In this case, T1 will need to wait for T3. This is detected as
>> deadlock (because T3 needs to eventually wait for T1 to commit before).
>> So we
>> roll back T3, allowing T1 to continue, and later re-try T3.
>> So it is safe to try to run everything in parallel, at least for
>> events that can be safely rolled back.
>> The only catch seems to be if there are a lot of potential conflicts in
>> application load. Then we could end up with too many rollbacks, causing
>> throughput to decrease rather than increase.
>> The next step is to add some flags to the GTID event on the master, and
>> those flags to control what to run in parallel on the slave:
>> - If DDL or non-transactional tables are involved, set a flag to not run
>> event group in parallel with those that come before or after.
>> - Remember on the master if a transaction had to do a lock wait on
>> transaction; in this case it seems likely that a similar wait could be
>> needed on the slave, so do not start this transaction in parallel with
>> earlier ones.
>> - Maybe we can have a flag for "large" transactions that modify many
>> rows; we
>> could choose not to run those in parallel with earlier transactions, to
>> avoid the need for expensive rollback of lots of rows.
>> - Allow the user to set some @@rpl_not_parallel variable, to explicitly
>> annotate transactions that are known to be likely to conflict, and
>> not worth it to try to run in parallel.
>> This should be simple to do. Later we could also think about adding
>> checks on
>> the slave to further control what to do in parallel, however, I have not
>> thought much about this.
>> This patch seems to have a lot of potential to finally get a good
>> solution to
>> the single-threaded slave problem. But testing against real-life
>> workloads will be
>> needed to understand how to balance the speculative parallelisation
>> avoiding excessive rollbacks.
>> - Kristian.
>> Mailing list: https://launchpad.net/~maria-developers
>> Post to : maria-developers@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~maria-developers
>> More help : https://help.launchpad.net/ListHelp