← Back to team overview

maria-discuss team mailing list archive

Re: Is disabling doublewrite safe on ZFS?


Just a small correction: ZoL does not support O_DIRECT, but FreeBSD ZFS does. Probably other distributions also do.
Regards,Federico Razzoli

    Il martedì 14 agosto 2018, 20:13:59 GMT+1, Gionatan Danti <g.danti@xxxxxxxxxx> ha scritto:  
 Il 14-08-2018 19:58 Vladislav Vaintroub ha scritto:
> There is at least one case I know where you do not need doublewrite
> buffer. And you even do not need CoW filesystem.
> A combination of OS guarantee of atomic writes if they are
> sector-sized writes, and matching innodb page size being. If you have
> disks with 4K sectors (quite common), and you chose
> innodb-page-size=4K, and use innodb-flush-neighbors=0 , and use
> Windows as your OS (because this one provides guarantees that
> single-sector sized/aligned writes are atomic as per
> https://docs.microsoft.com/en-us/windows/desktop/api/fileapi/nf-fileapi-writefile
> [1]), then you can safely disable innodb-doublewrite. You do not need
> "supported hardware" for that.

lets suppose mysqld crashes during the copy from its internal buffer and 
the OS write cache, ending with only partial data being transferred (ie: 
2K data on a 4Kn disk). If using direct writes (or 
FILE_FLAG_WRITE_THROUGH) the partial data will be rejected by the 
underlying disk throwing an I/O error. But what about 

> As for Linux, I think Marko tested what happens when process is
> getting killed, and sure enough, it can be killed in the middle of a
> larger write, and have partially written data. I suspect that O_DIRECT
> and sector-sized writes might be atomic ( as in Windows example), but
> I did not find any written confirmation for that. Someone with better
> understanding of kernel and filesystems could prove or disprove this
> suspicion.

Yes, O_DIRECT + single sector aligned write *should* be atomic, 
supposing the disk rejects the partial write. However, this really is an 
hardware-specific condition. Back to ZFS: the entire record *will* be 
written atomically. As a first approximation, when recordsize == innodb 
page size, doublewrite should not be needed. However, as stated above, 
what will happen if the mysqld process is killed at the wrong moment?

I fear something as:
- InnoDB pagesize and ZFS recordsize are both at 16K;
- InnoDB calls write() copy 16K of internal data to OS pagecache (ZFS 
does not support O_DIRECT, by the way);
- mysqld crashes at the worst possible moment, so only 1/2 of InnoDB 
internal data (8K) was written by write();
- ZFS received the partial 8K data, but it does *not* know these are 
partial data only (ie: it "see" a normal 8K write);
- some seconds later, partial data are commited to stable storage;
- when mysqld restarts, InnoDB complains about partial page write.

This bring another question: how will InnoDB behave after detecting a 
partial page write? Will it shut down itself?

Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@xxxxxxxxxx - info@xxxxxxxxxx
GPG public key ID: FF5F32A8

Mailing list: https://launchpad.net/~maria-discuss
Post to    : maria-discuss@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~maria-discuss
More help  : https://help.launchpad.net/ListHelp

Follow ups