maria-discuss team mailing list archive
-
maria-discuss team
-
Mailing list archive
-
Message #05211
Re: Is disabling doublewrite safe on ZFS?
I think it is up to the OS kernel how to handle interrupt request when a system call is in progress. If kernel reacts to signals/exceptions by interrupting write() call in the middle of copying data from your buffer to the page cache, nothing would help. And what means “in the middle”, is also unclear. There would be some kind of granularity (page size in pagecache maybe). I do not know what different kernels do in such cases, but this the is level where ZFS is not involved at all.
From: Gionatan Danti<mailto:g.danti@xxxxxxxxxx>
Sent: Tuesday, 14 August 2018 21:13
Subject: Re: [Maria-discuss] Is disabling doublewrite safe on ZFS?
Il 14-08-2018 19:58 Vladislav Vaintroub ha scritto:
> There is at least one case I know where you do not need doublewrite
> buffer. And you even do not need CoW filesystem.
>
> A combination of OS guarantee of atomic writes if they are
> sector-sized writes, and matching innodb page size being. If you have
> disks with 4K sectors (quite common), and you chose
> innodb-page-size=4K, and use innodb-flush-neighbors=0 , and use
> Windows as your OS (because this one provides guarantees that
> single-sector sized/aligned writes are atomic as per
> https://docs.microsoft.com/en-us/windows/desktop/api/fileapi/nf-fileapi-writefile
> [1]), then you can safely disable innodb-doublewrite. You do not need
> "supported hardware" for that.
Hi,
lets suppose mysqld crashes during the copy from its internal buffer and
the OS write cache, ending with only partial data being transferred (ie:
2K data on a 4Kn disk). If using direct writes (or
FILE_FLAG_WRITE_THROUGH) the partial data will be rejected by the
underlying disk throwing an I/O error. But what about
non-O_DIRECT/FILE_FLAG_WRITE_THROUGH writes?
> As for Linux, I think Marko tested what happens when process is
> getting killed, and sure enough, it can be killed in the middle of a
> larger write, and have partially written data. I suspect that O_DIRECT
> and sector-sized writes might be atomic ( as in Windows example), but
> I did not find any written confirmation for that. Someone with better
> understanding of kernel and filesystems could prove or disprove this
> suspicion.
Yes, O_DIRECT + single sector aligned write *should* be atomic,
supposing the disk rejects the partial write. However, this really is an
hardware-specific condition. Back to ZFS: the entire record *will* be
written atomically. As a first approximation, when recordsize == innodb
page size, doublewrite should not be needed. However, as stated above,
what will happen if the mysqld process is killed at the wrong moment?
I fear something as:
- InnoDB pagesize and ZFS recordsize are both at 16K;
- InnoDB calls write() copy 16K of internal data to OS pagecache (ZFS
does not support O_DIRECT, by the way);
- mysqld crashes at the worst possible moment, so only 1/2 of InnoDB
internal data (8K) was written by write();
- ZFS received the partial 8K data, but it does *not* know these are
partial data only (ie: it "see" a normal 8K write);
- some seconds later, partial data are commited to stable storage;
- when mysqld restarts, InnoDB complains about partial page write.
This bring another question: how will InnoDB behave after detecting a
partial page write? Will it shut down itself?
Thanks.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it<http://www.assyoma.it>
email: g.danti@xxxxxxxxxx - info@xxxxxxxxxx
GPG public key ID: FF5F32A8
Follow ups
References