yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #77570
[Bug 1818847] Re: Fix QEMU cache mode used for image conversion and Nova instances
Reviewed: https://review.openstack.org/640781
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=e7b64eaad82db38dd46f586b650da4ddde42533b
Submitter: Zuul
Branch: master
commit e7b64eaad82db38dd46f586b650da4ddde42533b
Author: Kashyap Chamarthy <kchamart@xxxxxxxxxx>
Date: Thu Feb 28 12:33:12 2019 +0100
qemu: Make disk image conversion dramatically faster
tl;dr: Use 'writeback' instead of 'writethrough' as the cache mode of
the target image for `qemu-img convert`. Two reasons: (a) if the image
conversion completes succesfully, then 'writeback' calls fsync() to
safely write data to the physical disk; and (b) 'writeback' makes the
image conversion a _lot_ faster.
Back-of-the-envelope "benchmark" (on an SSD)
--------------------------------------------
(Ran both the tests thrice each; version: qemu-img-2.11.0)
With 'writethrough':
$> time (qemu-img convert -t writethrough -f qcow2 -O raw \
Fedora-Cloud-Base-29.qcow2 Fedora-Cloud-Base-29.raw)
real 1m43.470s
user 0m8.310s
sys 0m3.661s
With 'writeback':
$> time (qemu-img convert -t writeback -f qcow2 -O raw \
Fedora-Cloud-Base-29.qcow2 5-Fedora-Cloud-Base-29.raw)
real 0m7.390s
user 0m5.179s
sys 0m1.780s
I.e. ~103 seconds of elapsed wall-clock time for 'writethrough' vs. ~7
seconds for 'writeback' -- IOW, 'writeback' is nearly _15_ times faster!
Details
-------
Nova commit e6ce9557f84cdcdf4ffdd12ce73a008c96c7b94a ("qemu-img do not
use cache=none if no O_DIRECT support") was introduced to make instances
boot on filesystems that don't support 'O_DIRECT' (which bypasses the
host page cache and flushes data directly to the disk), such as 'tmpfs'.
In doing so it introduced the 'writethrough' cache for the target image
for `qemu-img convert`.
This patch proposes to change that to 'writeback'.
Let's addresses the 'safety' concern:
"What about data integrity in the event of a host crash (especially
on shared file systems such as NFS)?"
Answer: If the host crashes mid-way during image conversion, then
neither "data integrity" nor the cache mode in use matters. But if the
image conversion completes _succesfully_, then 'writeback' will safely
write the data to the physical disk, just as 'writethough' does.
So we are as safe as we can, but with the extra benefit of image
conversion being _much_ faster.
* * *
The `qemu-img convert` command defaults to 'cache=writeback' for the
source image. And 'cache=unsafe' for the target, because if `qemu-img`
"crashes during the conversion, the user will throw away the broken
output file anyway and start over"[1]. And `qemu-img convert`
supports[2] fsync() for the target image since QEMU 1.1 (2012).
[1] https://git.qemu.org/?p=qemu.git;a=commitdiff;h=1bd8e175
-- "qemu-img convert: Use cache=unsafe for output image"
[2] https://git.qemu.org/?p=qemu.git;a=commitdiff;h=80ccf93b
-- "qemu-img: let 'qemu-img convert' flush data"
Closes-Bug: #1818847
Change-Id: I574be2b629aaff23556e25f8db0d740105be6f07
Signed-off-by: Kashyap Chamarthy <kchamart@xxxxxxxxxx>
Looks-good-to-me'd-by: Kevin Wolf <kwolf@xxxxxxxxxx>
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1818847
Title:
Fix QEMU cache mode used for image conversion and Nova instances
Status in OpenStack Compute (nova):
Fix Released
Bug description:
Nova uses QEMU's disk image cache modes in two main areas:
(1) When decicding what cache mode to use for the target disk image when
converting (using `qemu-img convert`) images from one format to
another (qcow2 <-> raw).
See unprivileged_convert_image() in nova/privsep/qemu.py.
(2) When configuring cache modes for running guests (Nova instances).
Nova tells libvirt what cache mode to use, and libvirt will in turn
configure block devices via QEMU (using its '-drive' command-line
option).
See disk_cachemode() in nova/virt/libvirt/driver.py. (And also for
"volume drivers" like SMBFS and Virtuozzo Storage also use
'writethrough' -- refer smbfs.py and vzstorage.py.)
In both cases Nova uses QEMU's a combination of cache modes 'none' and
'writethrough'. But that is incorrect, because of our misunderstanding
of how cache modes work. E.g. Nova's libvirt driver currently assumes
(refer disk_cachemode()) that 'writethrough' and 'none' cache modes have
the same behaviour with respect to host crash safety, which is not at
all true.
Fix these wrong assumptions.
(Also consult the QEMU Block Layer developers to double-check the
behaviour of cache modes and where they are applicable.)
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1818847/+subscriptions
References