← Back to team overview

maria-discuss team mailing list archive

Re: fsync alternative

 

Marko Mäkelä <marko.makela@xxxxxxxxxxx> writes:

> There is some ongoing development work around this area. If the binlog
> is enabled, it should actually be unnecessary to persist the storage
> engine log, because it should be possible to replay any
> not-committed-in-engine transactions from the binlog. We must merely

Nice to hear that this is being worked on. There is an old worklog MWL#164
with some analysis of potential issues to be solved.

  http://worklog.askmonty.org/worklog/Server-RawIdeaBin/?tid=164

It becomes tricky in some corner cases, for example cross-engine
transactions where one engine has the changes persisted after a crash and
the other does not.

But the impact of a robust implementation of this could be huge,
double-fsync-per-commit is _really_ expensive. Hopefully the corner cases
can be solved or handled with some kind of fall-back.

> But, InnoDB’s use of fsync() on data files feels like an overkill. I
> believe that we only need some 'write barriers', that is, some

This is also quite interesting. My (admittedly limited) understanding is
that disks in fact have write-barrier functionality, and that journalling
file systems in fact use that. The problem seems to be how to expose that to
userspace. I wonder if there are any existing or proposed interfaces to
allow userspace to specify write barriers between writes.

 - Kristian.


References