maria-discuss team mailing list archive
-
maria-discuss team
-
Mailing list archive
-
Message #05205
Is disabling doublewrite safe on ZFS?
Hi all,
as by subject: is disabling doublewrite safe on ZFS (and/or other CoW
filesystems as BTRFS)?
Background information: ZFS is a CoW/transactional filesystem, meaning
that writes are atomic: they fully commit or are rolled backup to latest
"stable" version. This lend many peoples to claim not only that
disabling doublewrite is safe when InnoDB runs on top of a ZFS storage,
but even that it is the *right* thing to do for increase InnoDB write
performance. The reason is that when ZFS recordsize is set the same as
InnoDB page/record size, no partial page write can happen. Some
evidence:
http://assets.en.oreilly.com/1/event/21/Optimizing%20MySQL%20Performance%20with%20ZFS%20Presentation.pdf
However, I am not fully committed (pun intended!) to this idea. While I
surely appreciate ZFS write atomicity, and how it *does* protect from
system-wide crash (ie: powerloss), I fear that an InnoDB/MariaDB crash
*can* lead to partial page writes. If, for example, the mysqld process
crashes (or it is killed) when copying an internal buffer during a
write() call, I can imagine the filesystem will receive wrong/partial
data, which it will happily write to the main storage pool (as it know
nothing of internal data consistency from InnoDB point of view).
I understand that this failure scenario should be *really* rare, as the
critical operation (buffer copy from mysqld to system pagecache/ARC via
write()) is extremely fast compared to the real data flush to stable
storage (meaning that the "vulnerable time window" is very small).
However, it remain different from 100% safety. Moreover, it really
backfired in the past:
https://www.percona.com/blog/2015/06/17/update-on-the-innodb-double-write-buffer-and-ext4-transactions/
From my understanding, disabling doublebuffer is really 100% safe only
when enabling atomic writes on *a supported hardware*
(https://mariadb.com/kb/en/library/atomic-write-support/).
Am I missing something? Am I over-thinking it, maybe?
Thanks.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@xxxxxxxxxx - info@xxxxxxxxxx
GPG public key ID: FF5F32A8
Follow ups