kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #102031
[Bug 1292234] Re: qcow2 image corruption on non-extent filesystems (ext3)
** Description changed:
+ [Impact]
+ Users of non-extent ext4 filesystems (ext4 ^extents, or ext3 w/ CONFIG_EXT4_USE_FOR_EXT23=y) can encounter data corruption when using fallocate with FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE flags. This seems to be a regression in ext4_ind_remove_space introduced in 4f579ae7, whereas commit 77ea2a4b passes the following test case.
+
+ [Test Case]
+ 1) Setup ext4 ^extents, or ext3 filesystem with CONFIG_EXT4_USE_FOR_EXT23=y
+ 2) Create and install a VM using a qcow2 image and store the file on the filesystem
+ 3) Snapshot the image with qemu-img
+ 4) Boot the image and do some disk operations (fio,etc)
+ 5) Shutdown image and delete snapshot
+ 6) Repeat 3-5 until VM no longer boots due to image corruption, generally this takes a few iterations depending on disk operations.
+
+
+ --
+
The security team uses a tool (http://bazaar.launchpad.net/~ubuntu-
bugcontrol/ubuntu-qa-tools/master/view/head:/vm-tools/uvt) that uses
libvirt snapshots quite a bit. I noticed after upgrading to trusty some
time ago that qemu 1.7 (and the qemu 2.0 in the candidate ppa) has had
stability problems such that the disk/partition table seems to be
corrupted after removing a libvirt snapshot and then creating another
with the same name. I don't have a very simple reproducer, but had
enough that hallyn suggested I file a bug. First off:
qemu-kvm 2.0~git-20140307.4c288ac-0ubuntu2
$ cat /proc/version_signature
Ubuntu 3.13.0-16.36-generic 3.13.5
$ qemu-img info ./forhallyn-trusty-amd64.img
image: ./forhallyn-trusty-amd64.img
file format: qcow2
virtual size: 8.0G (8589934592 bytes)
disk size: 4.0G
cluster_size: 65536
Format specific information:
compat: 0.10
Steps to reproduce:
1. create a virtual machine. For a simplified reproducer, I used virt-manager with:
OS type: Linux
Version: Ubuntu 14.04
Memory: 768
CPUs: 1
Select managed or existing (Browse, new volume)
Create a new storage volume:
qcow2
Max capacity: 8192
Allocation: 0
Advanced:
NAT
kvm
x86_64
firmware: default
2. install a VM. I used trusty-desktop-amd64.iso from Jan 23 since it
seems like I can hit the bug more reliably if I have lots of updates in
a dist-upgrade. I have seen this with lucid-trusty guests that are i386
and amd64. After the install, reboot and then cleanly shutdown
3. Backup the image file somewhere since steps 1 and 2 take a while :)
4. Execute the following commands which are based on what our uvt tool
does:
$ virsh snapshot-create-as forhallyn-trusty-amd64 pristine "uvt snapshot"
$ virsh snapshot-current --name forhallyn-trusty-amd64
pristine
$ virsh start forhallyn-trusty-amd64
$ virsh snapshot-list forhallyn-trusty-amd64 # this is showing as shutoff after start, this might be different with qemu 1.5
in guest:
sudo apt-get update
sudo apt-get dist-upgrade
780 upgraded...
shutdown -h now
$ virsh snapshot-delete forhallyn-trusty-amd64 pristine --children
$ virsh snapshot-create-as forhallyn-trusty-amd64 pristine "uvt snapshot"
$ virsh start forhallyn-trusty-amd64 # this command works, but there is
often disk corruption
The idea behind the above is to create a new VM with a pristine snapshot
that we could revert later if we wanted. Instead, we boot the VM, run
apt-get dist-upgrade, cleanly shutdown and then remove the old
'pristine' snapshot and create a new 'pristine' snapshot. The intention
is to update the VM and the pristine snapshot so that when we boot the
next time, we boot from the updated VM and can revert back to the
updated VM.
After running 'virsh start' after doing snapshot-delete/snapshot-create-
as, the disk may be corrupted. This can be seen with grub failing to
find .mod files, the kernel not booting, init failing, etc.
This does not seem to be related to the machine type used. Ie, pc-
i440fx-1.5, pc-i440fx-1.7 and pc-i440fx-2.0 all fail with qemu 2.0, pc-
i440fx-1.5 and pc-i440fx-1.7 fail with qemu 1.7 and pc-i440fx-1.5 works
fine with qemu 1.5.
Only workaround I know if is to downgrade qemu to 1.5.0+dfsg-3ubuntu5.4
from Ubuntu 13.10.
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1292234
Title:
qcow2 image corruption on non-extent filesystems (ext3)
Status in linux package in Ubuntu:
In Progress
Status in qemu package in Ubuntu:
Invalid
Bug description:
[Impact]
Users of non-extent ext4 filesystems (ext4 ^extents, or ext3 w/ CONFIG_EXT4_USE_FOR_EXT23=y) can encounter data corruption when using fallocate with FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE flags. This seems to be a regression in ext4_ind_remove_space introduced in 4f579ae7, whereas commit 77ea2a4b passes the following test case.
[Test Case]
1) Setup ext4 ^extents, or ext3 filesystem with CONFIG_EXT4_USE_FOR_EXT23=y
2) Create and install a VM using a qcow2 image and store the file on the filesystem
3) Snapshot the image with qemu-img
4) Boot the image and do some disk operations (fio,etc)
5) Shutdown image and delete snapshot
6) Repeat 3-5 until VM no longer boots due to image corruption, generally this takes a few iterations depending on disk operations.
--
The security team uses a tool (http://bazaar.launchpad.net/~ubuntu-
bugcontrol/ubuntu-qa-tools/master/view/head:/vm-tools/uvt) that uses
libvirt snapshots quite a bit. I noticed after upgrading to trusty
some time ago that qemu 1.7 (and the qemu 2.0 in the candidate ppa)
has had stability problems such that the disk/partition table seems to
be corrupted after removing a libvirt snapshot and then creating
another with the same name. I don't have a very simple reproducer, but
had enough that hallyn suggested I file a bug. First off:
qemu-kvm 2.0~git-20140307.4c288ac-0ubuntu2
$ cat /proc/version_signature
Ubuntu 3.13.0-16.36-generic 3.13.5
$ qemu-img info ./forhallyn-trusty-amd64.img
image: ./forhallyn-trusty-amd64.img
file format: qcow2
virtual size: 8.0G (8589934592 bytes)
disk size: 4.0G
cluster_size: 65536
Format specific information:
compat: 0.10
Steps to reproduce:
1. create a virtual machine. For a simplified reproducer, I used virt-manager with:
OS type: Linux
Version: Ubuntu 14.04
Memory: 768
CPUs: 1
Select managed or existing (Browse, new volume)
Create a new storage volume:
qcow2
Max capacity: 8192
Allocation: 0
Advanced:
NAT
kvm
x86_64
firmware: default
2. install a VM. I used trusty-desktop-amd64.iso from Jan 23 since it
seems like I can hit the bug more reliably if I have lots of updates
in a dist-upgrade. I have seen this with lucid-trusty guests that are
i386 and amd64. After the install, reboot and then cleanly shutdown
3. Backup the image file somewhere since steps 1 and 2 take a while :)
4. Execute the following commands which are based on what our uvt tool
does:
$ virsh snapshot-create-as forhallyn-trusty-amd64 pristine "uvt snapshot"
$ virsh snapshot-current --name forhallyn-trusty-amd64
pristine
$ virsh start forhallyn-trusty-amd64
$ virsh snapshot-list forhallyn-trusty-amd64 # this is showing as shutoff after start, this might be different with qemu 1.5
in guest:
sudo apt-get update
sudo apt-get dist-upgrade
780 upgraded...
shutdown -h now
$ virsh snapshot-delete forhallyn-trusty-amd64 pristine --children
$ virsh snapshot-create-as forhallyn-trusty-amd64 pristine "uvt snapshot"
$ virsh start forhallyn-trusty-amd64 # this command works, but there
is often disk corruption
The idea behind the above is to create a new VM with a pristine
snapshot that we could revert later if we wanted. Instead, we boot the
VM, run apt-get dist-upgrade, cleanly shutdown and then remove the old
'pristine' snapshot and create a new 'pristine' snapshot. The
intention is to update the VM and the pristine snapshot so that when
we boot the next time, we boot from the updated VM and can revert back
to the updated VM.
After running 'virsh start' after doing snapshot-delete/snapshot-
create-as, the disk may be corrupted. This can be seen with grub
failing to find .mod files, the kernel not booting, init failing, etc.
This does not seem to be related to the machine type used. Ie, pc-
i440fx-1.5, pc-i440fx-1.7 and pc-i440fx-2.0 all fail with qemu 2.0,
pc-i440fx-1.5 and pc-i440fx-1.7 fail with qemu 1.7 and pc-i440fx-1.5
works fine with qemu 1.5.
Only workaround I know if is to downgrade qemu to 1.5.0+dfsg-
3ubuntu5.4 from Ubuntu 13.10.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1292234/+subscriptions