← Back to team overview

maria-discuss team mailing list archive

Is disabling doublewrite safe on ZFS?


Hi all,
as by subject: is disabling doublewrite safe on ZFS (and/or other CoW filesystems as BTRFS)?

Background information: ZFS is a CoW/transactional filesystem, meaning that writes are atomic: they fully commit or are rolled backup to latest "stable" version. This lend many peoples to claim not only that disabling doublewrite is safe when InnoDB runs on top of a ZFS storage, but even that it is the *right* thing to do for increase InnoDB write performance. The reason is that when ZFS recordsize is set the same as InnoDB page/record size, no partial page write can happen. Some evidence: http://assets.en.oreilly.com/1/event/21/Optimizing%20MySQL%20Performance%20with%20ZFS%20Presentation.pdf

However, I am not fully committed (pun intended!) to this idea. While I surely appreciate ZFS write atomicity, and how it *does* protect from system-wide crash (ie: powerloss), I fear that an InnoDB/MariaDB crash *can* lead to partial page writes. If, for example, the mysqld process crashes (or it is killed) when copying an internal buffer during a write() call, I can imagine the filesystem will receive wrong/partial data, which it will happily write to the main storage pool (as it know nothing of internal data consistency from InnoDB point of view).

I understand that this failure scenario should be *really* rare, as the critical operation (buffer copy from mysqld to system pagecache/ARC via write()) is extremely fast compared to the real data flush to stable storage (meaning that the "vulnerable time window" is very small). However, it remain different from 100% safety. Moreover, it really backfired in the past: https://www.percona.com/blog/2015/06/17/update-on-the-innodb-double-write-buffer-and-ext4-transactions/

From my understanding, disabling doublebuffer is really 100% safe only when enabling atomic writes on *a supported hardware* (https://mariadb.com/kb/en/library/atomic-write-support/).

Am I missing something? Am I over-thinking it, maybe?

Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@xxxxxxxxxx - info@xxxxxxxxxx
GPG public key ID: FF5F32A8

Follow ups