drizzle-discuss team mailing list archive
Mailing list archive
Parallel replication slaves?
I've been thinking a lot about replication since the conference.
Patrick Galbraith and I sat in the back of the room at the BOF on
Wednesday night and worked on DBD::drizzle, but I was sort of half
listening to the ideas being presented. If this one already was
presented, then great, we'll have confirmation that I, indeed, have a
subconscious that was listening, and it is working properly to bubble
ideas up to my conscious mind ;). If not, then I'd like to get
peoples' opinion on this before wandering into the code and working on
The way MySQL's binlog works, it is basically a FIFO, in completion
order. Because we don't know what operations were dependent on others
that were completed before them, we have to perform them one by one in
a single thread. This can get really ugly if you have a really high
concurrency write laden master.. slaves might never catch up.
After looking at google's work on a global transaction ID, I had a
thought. What if every statement took note of the last comitted
transaction ID when it started. When it completes and needs to be
replicated, it records the end transaction ID as well. This gives us a
window for each operation in the binlog.
So, on a slave we read these items, which basically look like this
(I'm writing statement based, but row based would be no different as
long as single transactions' row operations are wrapped in transaction
IDs the same way):
INSERT INTO t1 (id,x,desc) VALUES (1,3,'foo')
INSERT INTO t2 (id, z) VALUES('a','bc')
UPDATE t1 SET x = 9 WHERE ID=1
UPDATE t2 SET desc = 'the quick brown fox jumped over the lazy brown
dog' WHERE id = 'a'
UPDATE t2 SET archiveme=1 WHERE date <= '2008-01-01'
The slave can read ahead, performing each statement without committing
it, until it encouters a StartTXID >= the lowest current EndTXID. This
would result above in running the two inserts, then seeing that next
StartTXID: 2 is >= 2, it must not start those statements until the
other two have been committed. However, it can continue on in the log
to the archival UPDATE and run it right now. As long as we have a
transactional storage engine, we just have to commit them in the same
order as they were committed on the master so that we don't present a
view that never existed on the master. But much of the work can be
done in parallel just like it was done on the master.
It might even make sense to offer the user a choice between
concurrency and absolute consistency by saying that statements can be
comitted in any order as long as they're not dependent on one another.
That would allow us to also apply this same mode of operation to non-
transactional storage engines (I think..).
This should allow us to take advantage of almost the same level of
concurrency that the master has available to it.