← Back to team overview

maria-developers team mailing list archive

Re: a bug that affects group commit implementations


[Cc:ing maria-developers@ as I want us to get into the habbit of discussing
more openly, hope that is ok]

MARK CALLAGHAN <mdcallag@xxxxxxxxx> writes:

> I think that Mats identified the problem and then Vamsi did a
> reproduction of it.
> http://bugs.mysql.com/58787

Thanks for pointing me to this issue Mark!

I'm wondering what the real bug is here? Can one of you explain?

I suppose it is that we can get in binary log eg. INSERT, ALTER; while in
InnoDB transaction log we get ALTER, INSERT ?

I believe the reason to avoid such inconsistency normally is to allow
something like XtraBackup to get a consistent binlog position that can be used
to provision a slave. However, for DDL this is not enough, as .frm files are
not handled. So ensuring consistent order for DDL between binlog and innodb
transaction log does not really solve the problem.

The bug also suggests that DDL should use 2-phase commit. But 2-phase commit
implies the ability to rollback. I do not know if InnoDB is able to roll back
DDL, but MySQL .frm handling certainly is not. So great care would be needed
to not just introduce different bugs, if this approach was taken. It is also a
non-trivial change of the storage engine API.

There has been talk of making DLL in MySQL (/MariaDB) transactional, or at
least crash-safe. This is something that I would really like to see. I was
told that partitioning already has code for this, by logging .frm changes and
recovering / rolling back after crash. Something similar could work for
general DDL. This would allow a proper solution to the binlog order problem
for DDL.

It does not mean that a partial solution now cannot be an improvement, however
I do not understand from the bug discussion what such improvement would
be. Can you elaborate?

BTW, the purpose of 2-phase commit is to ensure consistency between different
engines/binlog in case of crash, not to ensure consistent ordering of
commits. The fact that it currently _does_ ensure ordering for InnoDB is just
a gross hack with the prepare_commit_mutex. This is so expensive (3 x fsync()
per commit) that I believe most users don't use it anyway (eg. setting
sync_binlog != 1, which defeats the whole purpose of prepare_commit_mutex). I
would really recommend looking at MWL#116
(http://askmonty.org/worklog/Server-Sprint/?tid=116), which solves the
ordering issue in a proper way.

 - Kristian.