← Back to team overview

drizzle-discuss team mailing list archive

Parallel replication slaves?

 

I've been thinking a lot about replication since the conference. Patrick Galbraith and I sat in the back of the room at the BOF on Wednesday night and worked on DBD::drizzle, but I was sort of half listening to the ideas being presented. If this one already was presented, then great, we'll have confirmation that I, indeed, have a subconscious that was listening, and it is working properly to bubble ideas up to my conscious mind ;). If not, then I'd like to get peoples' opinion on this before wandering into the code and working on it.


The way MySQL's binlog works, it is basically a FIFO, in completion order. Because we don't know what operations were dependent on others that were completed before them, we have to perform them one by one in a single thread. This can get really ugly if you have a really high concurrency write laden master.. slaves might never catch up.

After looking at google's work on a global transaction ID, I had a thought. What if every statement took note of the last comitted transaction ID when it started. When it completes and needs to be replicated, it records the end transaction ID as well. This gives us a window for each operation in the binlog.

So, on a slave we read these items, which basically look like this (I'm writing statement based, but row based would be no different as long as single transactions' row operations are wrapped in transaction IDs the same way):


StartTXID: 1
INSERT INTO t1 (id,x,desc) VALUES (1,3,'foo')
EndTXID: 2

StartTXID: 1
INSERT INTO t2 (id, z) VALUES('a','bc')
EndTXID: 3

StartTXID: 2
UPDATE t1 SET x = 9 WHERE ID=1
EndTXID: 4

StartTXID: 2
UPDATE t2 SET desc = 'the quick brown fox jumped over the lazy brown dog' WHERE id = 'a'
EndTXID: 5

StartTXID: 1
UPDATE t2 SET archiveme=1 WHERE date <= '2008-01-01'
EndTXID: 6


The slave can read ahead, performing each statement without committing it, until it encouters a StartTXID >= the lowest current EndTXID. This would result above in running the two inserts, then seeing that next StartTXID: 2 is >= 2, it must not start those statements until the other two have been committed. However, it can continue on in the log to the archival UPDATE and run it right now. As long as we have a transactional storage engine, we just have to commit them in the same order as they were committed on the master so that we don't present a view that never existed on the master. But much of the work can be done in parallel just like it was done on the master.

It might even make sense to offer the user a choice between concurrency and absolute consistency by saying that statements can be comitted in any order as long as they're not dependent on one another. That would allow us to also apply this same mode of operation to non- transactional storage engines (I think..).

This should allow us to take advantage of almost the same level of concurrency that the master has available to it.

Thoughts?



Follow ups