← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1350766] [NEW] Race condition: compute intermittanty corrupts base images on download from glance

 

Public bug reported:

Under certain conditions, which I happen to meet often on my Icehouse
single node setup, uploaded images or snapshots fail to boot. See also
https://ask.openstack.org/en/question/42804/icehouse-how-to-boot-a
-snapshot-from-a-running-instance/

Reason: When first instantiating a QCOW2 image, it's

(1)  downloaded as QCOW2 to /var/lib/nova/instances/_base/IMAGEID.part
(2)  converted to RAW format base /var/lib/nova/instances/_base/IMAGEID.converted using qemu-img

The step (1) is performed in nova/image/glance.py,
GlanceImageService.download using buffered IO, which does not guarantee
the resulting data to be written to disk on file close. Consequently,
the source image file may not be written completely when qemu-img starts
reading. Whether the result is good or bad depends on speed of download,
file size, and how fast qemu-image digests its input.

Proposed fix: enforce fsync on output File object before returning from
download. Patch attached.

** Affects: nova
     Importance: Undecided
         Status: New

** Patch added: "Enforce fsync on output File object before returning from download"
   https://bugs.launchpad.net/bugs/1350766/+attachment/4166489/+files/nova-glance.patch

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1350766

Title:
  Race condition: compute intermittanty corrupts base images on download
  from glance

Status in OpenStack Compute (Nova):
  New

Bug description:
  Under certain conditions, which I happen to meet often on my Icehouse
  single node setup, uploaded images or snapshots fail to boot. See also
  https://ask.openstack.org/en/question/42804/icehouse-how-to-boot-a
  -snapshot-from-a-running-instance/

  Reason: When first instantiating a QCOW2 image, it's

  (1)  downloaded as QCOW2 to /var/lib/nova/instances/_base/IMAGEID.part
  (2)  converted to RAW format base /var/lib/nova/instances/_base/IMAGEID.converted using qemu-img

  The step (1) is performed in nova/image/glance.py,
  GlanceImageService.download using buffered IO, which does not
  guarantee the resulting data to be written to disk on file close.
  Consequently, the source image file may not be written completely when
  qemu-img starts reading. Whether the result is good or bad depends on
  speed of download, file size, and how fast qemu-image digests its
  input.

  Proposed fix: enforce fsync on output File object before returning
  from download. Patch attached.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1350766/+subscriptions


Follow ups

References