maria-discuss team mailing list archive

Thread
Date

Re: Is disabling doublewrite safe on ZFS?

To: Marko Mäkelä <marko.makela@xxxxxxxxxxx>
From: Gionatan Danti <g.danti@xxxxxxxxxx>
Date: Fri, 24 Aug 2018 17:25:43 +0200
Cc: Maria Discuss <maria-discuss@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <CAJuX1hwk0g8kpdXRgzQUqJdJH3rECgejyf+VXk=zUq5kUZuJzg@mail.gmail.com>
Organization: Assyoma s.r.l.
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1

On 21/08/2018 11:52, Marko Mäkelä wrote:

I believe that the Linux kernel can interrupt any write at 4096-byte
boundaries when a signal is delivered to the process.
I am curious: Where was it claimed that data=journal guarantees atomic
writes (other than [1])?
I would expect it to only guarantee that anything that was written to
the journal will be durable.
Whether the actual write request was honored in full is a separate matter.

Sure, ext4 + data=journal only has "atomic writes" in the sense thatwhat was written in the journal transaction/commit would be completelycommited into the main filesystem.

But from the application point of view, this could be very well apartial write. This is exactly the point I am stressing: durable writesdoes *not* means atomicity in the true sense (ie: from applicationstandpoint).

In this regards, I would imagine for ZFS to behave similarly: at TXGcommit, anything buffered in RAM (and replicated by the ZIL) would becommitted to the main filesystem, but if the application write itselfwas incomplete (due to an application crash) *and* application-sidedoublebuffer was disabled, bad thing could happen...

Please report back any findings, whether or not you consider them to
be interesting.

I believe that it is technically possible for a copy-on-write
filesystem like ZFS
to support atomic writes, but for that to be possible in practice, the
interfaces
inside the kernel must be implemented in an appropriate way.
Disclaimer: I have no knowledge of the implementation details of any kernel.

I would expect (and I can be wrong!) that "atomic writes" inMySQL/MariaDB context means more that durable writes; rather, I expectthem to be a means for communicate to the lower layer (ie: storagedevice) the application consistency model. Something similar to "bufferall writes and atomically write them into the main filesystem only whenI (MariaDB) *explicitly* tell you to do that". In this case, a crashedMariaDB will *never* commit the partial data to the main database files.

I wrote a test program[1] which spawn a child appending data to abacking file, killing (-9) it via the parent process at random time. Itseem *very* difficult to cause any sort of partial, both on ext4 (evenwith no data journal!) and ZFS. You basically had to interrupt thewrite() call at a very precise moment, and good luck doing that,especially when writing small data chunks.

So it really seems that a doublewrite-less MariaDB would be safe fromcorruption unless extraordinary bad luck (ie: mysqld crash at a *reallysmall* wrong moment) hits.

I plan to do some more test with a "real" MariaDB installation beingcrashed in the middle of intense writes. I'll update you when done.


Test setup:
- CentOS 7 x86-64 VM on KVM host
- 1 GB RAM
- 8 GB disk

- ext4 (data=ordered) and zfs filesystem ((compression=off, xattr=sa,recordize=16k)) created on top of a ~400 MB files under /dev/shm(basically a RAMDISK), mounted on /mnt/

- varying buffer size (16k, 128k and 4m)

Results...

# ext4 16k