← Back to team overview

maria-developers team mailing list archive

Re: [Commits] Rev 4376: MDEV-6676: Speculative parallel replication in http://bazaar.launchpad.net/~maria-captains/maria/10.0

 

On Tue, Sep 16, 2014 at 12:38 PM, Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx
> wrote:

> Robert Hodges <robert.hodges@xxxxxxxxxxxxxx> writes:
>
> > In that case it gets complex for outsiders to figure out how to restart
> if
> > there's a failure. Let's say my transaction fails after T1 commits but
> > before T1 commits. Then on restart I have to regenerate T2 and rerun it.
> >  That could be hard if T2 contains statements that came before T1 in the
> > original, serialized log.
>
> Agree, there are different approaches possible.
>
> The facility in MariaDB discussed here is precisely to apply in parallel a
> serialized log. We apply the transaction in parallel, but the commits
> happen
> in the original serial order. This makes the parallel apply transparent to
> applications, assuming MVCC.
>
> Each transaction inserts in a table its own monotonic transaction number.
> So
> after a restart, it is easy to find the point at which to resume - just
> find
> the highest number in the table.
>
> One limitation of this approach is that with N worker threads, we can never
> execute more than N transactions ahead of the slowest transaction in the
> log,
> because every thread needs to wait for the prior commit.
>

Right.  I thought about that problem a lot in the Tungsten parallel apply
design and ended up with an approach that allows workers to diverge by
several minutes or longer.  This enables Tungsten to maintain good
throughput even in the face of lumpy workloads that contain transactions
ranging from single inserts to updates involving hundreds of thousands or
millions of rows.  We did some early performance work in production
environments that showed the need for wide divergence to avoid
serialization around the "lumps" in the load.

>
> > Without a "group" transaction another approach is to keep separate
> restart
> > points for each worker thread.  It works well as long as you can
> > deterministically assign transactions to threads and keep those same
> > threads from getting too far apart. The logic to do this is non-trivial,
> so
> > it's not an approach for everybody.
>
> MariaDB also supports this approach. Here, the user explicitly assigns each
> transaction to a replication domain, and restart point is kept per-domain
> (not
> per-transaction). Different domains commit independently and out-of-order
> with
> respect to one another. Within one domain, commit order is strictly
> serialised.
>
> I like the approach with replication domains. Not only does it avoid the
> need
> for non-trivial logic to assign transactions to different threads and
> restart
> points. It also ensures that those restart points make sense to the
> user/DBA.
>
> So are replication domains "shards"?  My definition of a shard in this
context is a causally independent stream of transactions, which is
effectively a partial order within the fully serialized log.  That's an
excellent feature.  Assuming that's what you have done, how do you handle
operations like CREATE USER that are global in effect?  (Just point me to
docs or your blog if you wrote it up.  I would love to learn more.)

Cheers, Robert

Follow ups

References