← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1731986] [NEW] nova snapshot_volume_backed failure does not thaw filesystems

 

Public bug reported:

Noticed in OpenStack Mitaka (commit 9825c80), but the function
(snapshot_volume_backed) is unchanged as of commit a4fc1bcd. backends:
Libvirt + Ceph.

When Nova attempts to create an image / snapshot of a volume-backed
instance it first quiesces the instance in `snapshot_volume_backed()`.
It then loops over all of the block devices associated with that
instance. However, there is no exception handling in the for loop and
any failures on the part of Cinder are bubbled up and through the
`snapshot_volume_backed()` function. This causes the needed
`unquiesce()` to never be called on the instance, leaving it in an
inconsistent (read-only) state. This can cause operational errors in the
instance leaving it unusable.

In my case, the steps for reproduction are:

1) nova create image / ( "create snapshot" via horizon )
2) nova/compute/api snapshot_volume_backed() calls quiesce
3) "qemu-ga: info: guest-fsfreeze called" is seen in instance
4) cinder fails snapshot of volume due to OverLimit
5) cinder raises OverLimit
6) snapshot_volume_backed() never finishes due to OverLimit
7) filesystem is never thawed
8) instance unusable

I am in the process of writing and testing a patch and will have a
review for it soon.

** Affects: nova
     Importance: High
     Assignee: Eric M Gonzalez (egrh3)
         Status: Triaged


** Tags: api volumes

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1731986

Title:
  nova snapshot_volume_backed failure does not thaw filesystems

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  Noticed in OpenStack Mitaka (commit 9825c80), but the function
  (snapshot_volume_backed) is unchanged as of commit a4fc1bcd. backends:
  Libvirt + Ceph.

  When Nova attempts to create an image / snapshot of a volume-backed
  instance it first quiesces the instance in `snapshot_volume_backed()`.
  It then loops over all of the block devices associated with that
  instance. However, there is no exception handling in the for loop and
  any failures on the part of Cinder are bubbled up and through the
  `snapshot_volume_backed()` function. This causes the needed
  `unquiesce()` to never be called on the instance, leaving it in an
  inconsistent (read-only) state. This can cause operational errors in
  the instance leaving it unusable.

  In my case, the steps for reproduction are:

  1) nova create image / ( "create snapshot" via horizon )
  2) nova/compute/api snapshot_volume_backed() calls quiesce
  3) "qemu-ga: info: guest-fsfreeze called" is seen in instance
  4) cinder fails snapshot of volume due to OverLimit
  5) cinder raises OverLimit
  6) snapshot_volume_backed() never finishes due to OverLimit
  7) filesystem is never thawed
  8) instance unusable

  I am in the process of writing and testing a patch and will have a
  review for it soon.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1731986/+subscriptions


Follow ups