yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1237683] Re: inconsitent virtual size in qcow base image after block-migration

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Sean Dague <sean@xxxxxxxxx>
Date: Mon, 30 Mar 2015 14:33:57 -0000
Reply-to: Bug 1237683 <1237683@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Addressed in Havana

** Changed in: nova
       Status: Triaged => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1237683

Title:
  inconsitent virtual size in qcow base image after block-migration

Status in OpenStack Compute (Nova):
  Fix Released

Bug description:
  We're running a Grizzly node using KVM (1.0 from cloud-archive) with
  local ephemeral instance storage.

  Since approximately the time we upgraded to Grizzly we've been
  receiving complaints from particular users about secondary disk
  corruption issues. These users in particular are noticing the issue
  because they are relying on the secondary drive and also because hey
  are using CentOS, which drops to an interactive prompt before
  completing boot if it cannot mount all filesystems (Ubuntu does not).

  We've since discovered that this is specifically linked to block-
  migration of such disks which were created and formatted automatically
  by Nova. I.e., if we launch a new instance, log in and then reformat
  the drive internally (even as ext3), we don't encounter corruption
  issues after live-migration. If we change the virt_mkfs config option
  to use mkfs.ext4 then we also don't have the problem. Unfortunately
  that's not a simple fix for an active production cloud because all
  existing backing files must be removed in order to force their
  recreation.

  In investigating the problem we noticed a behaviour that might be interrelated - after block-migration the instances secondary disk has a "generic" backing file instances/_base/ephemeral, as opposed to the backing file it was created with on the origin host, e.g., instances/_base/ephemeral_30_default.
  These backing files have different virtual sizes(!):
  $ qemu-img info _base/ephemeral
  image: _base/ephemeral
  file format: raw
  virtual size: 2.0G (2147483648 bytes)
  disk size: 778M
  $ qemu-img info _base/ephemeral_30_default
  image: _base/ephemeral_30_default
  file format: raw
  virtual size: 30G (32212254720 bytes)
  disk size: 614M

  We're no experts on qcow, but this looks like it could be problematic
  and may explain the corruption issues we're seeing - I can imagine
  there would be problems for a migrated guest that attempts to read a
  previously untouched sector beyond the size of the new backing file.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1237683/+subscriptions