maria-discuss team mailing list archive
Mailing list archive
Re: Is disabling doublewrite safe on ZFS?
There is at least one case I know where you do not need doublewrite buffer. And you even do not need CoW filesystem.
A combination of OS guarantee of atomic writes if they are sector-sized writes, and matching innodb page size being. If you have disks with 4K sectors (quite common), and you chose innodb-page-size=4K, and use innodb-flush-neighbors=0 , and use Windows as your OS (because this one provides guarantees that single-sector sized/aligned writes are atomic as per https://docs.microsoft.com/en-us/windows/desktop/api/fileapi/nf-fileapi-writefile), then you can safely disable innodb-doublewrite. You do not need "supported hardware" for that.
As for Linux, I think Marko tested what happens when process is getting killed, and sure enough, it can be killed in the middle of a larger write, and have partially written data. I suspect that O_DIRECT and sector-sized writes might be atomic ( as in Windows example), but I did not find any written confirmation for that. Someone with better understanding of kernel and filesystems could prove or disprove this suspicion.
From: Maria-discuss <maria-discuss-bounces+vvaintroub=gmail.com@xxxxxxxxxxxxxxxxxxx> on behalf of Gionatan Danti <g.danti@xxxxxxxxxx>
Sent: Tuesday, August 14, 2018 11:37:49 AM
Subject: [Maria-discuss] Is disabling doublewrite safe on ZFS?
as by subject: is disabling doublewrite safe on ZFS (and/or other CoW
filesystems as BTRFS)?
Background information: ZFS is a CoW/transactional filesystem, meaning
that writes are atomic: they fully commit or are rolled backup to latest
"stable" version. This lend many peoples to claim not only that
disabling doublewrite is safe when InnoDB runs on top of a ZFS storage,
but even that it is the *right* thing to do for increase InnoDB write
performance. The reason is that when ZFS recordsize is set the same as
InnoDB page/record size, no partial page write can happen. Some
However, I am not fully committed (pun intended!) to this idea. While I
surely appreciate ZFS write atomicity, and how it *does* protect from
system-wide crash (ie: powerloss), I fear that an InnoDB/MariaDB crash
*can* lead to partial page writes. If, for example, the mysqld process
crashes (or it is killed) when copying an internal buffer during a
write() call, I can imagine the filesystem will receive wrong/partial
data, which it will happily write to the main storage pool (as it know
nothing of internal data consistency from InnoDB point of view).
I understand that this failure scenario should be *really* rare, as the
critical operation (buffer copy from mysqld to system pagecache/ARC via
write()) is extremely fast compared to the real data flush to stable
storage (meaning that the "vulnerable time window" is very small).
However, it remain different from 100% safety. Moreover, it really
backfired in the past:
From my understanding, disabling doublebuffer is really 100% safe only
when enabling atomic writes on *a supported hardware*
Am I missing something? Am I over-thinking it, maybe?
Assyoma S.r.l. - www.assyoma.it<http://www.assyoma.it>
email: g.danti@xxxxxxxxxx - info@xxxxxxxxxx
GPG public key ID: FF5F32A8
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~maria-discuss
More help : https://help.launchpad.net/ListHelp