← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1731986] Re: nova snapshot_volume_backed failure does not thaw filesystems

 

Reviewed:  https://review.openstack.org/519464
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=bca425a33f52584051348a3ace832be8151299a7
Submitter: Zuul
Branch:    master

commit bca425a33f52584051348a3ace832be8151299a7
Author: Eric M Gonzalez <eric@xxxxxxxxx>
Date:   Mon Nov 13 14:02:27 2017 -0600

    unquiesce instance on volume snapshot failure
    
    This patch adds an exception catch to "snapshot_volume_backed()" of
    compute/api.py that catches (at the moment) _all_ exceptions from the
    underlying cinderclient. Previously, if the instance is quiesced ( frozen
    filesystem ) then the exception will break execution of the function,
    skipping the needed unquiesce, and leave the instance in a frozen state.
    
    Now, the exception catch will unquiesce the instance if it was prior to
    the failure.
    
    Got a unit test in place with the help of Matt Riedemann.
        test_snapshot_volume_backed_with_quiesce_create_snap_fails
    
    Change-Id: I60de179c72eede6746696f29462ee9d805dace47
    Closes-bug: #1731986


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1731986

Title:
  nova snapshot_volume_backed failure does not thaw filesystems

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) ocata series:
  Confirmed
Status in OpenStack Compute (nova) pike series:
  In Progress
Status in OpenStack Compute (nova) queens series:
  In Progress

Bug description:
  Noticed in OpenStack Mitaka (commit 9825c80), but the function
  (snapshot_volume_backed) is unchanged as of commit a4fc1bcd. backends:
  Libvirt + Ceph.

  When Nova attempts to create an image / snapshot of a volume-backed
  instance it first quiesces the instance in `snapshot_volume_backed()`.
  It then loops over all of the block devices associated with that
  instance. However, there is no exception handling in the for loop and
  any failures on the part of Cinder are bubbled up and through the
  `snapshot_volume_backed()` function. This causes the needed
  `unquiesce()` to never be called on the instance, leaving it in an
  inconsistent (read-only) state. This can cause operational errors in
  the instance leaving it unusable.

  In my case, the steps for reproduction are:

  1) nova create image / ( "create snapshot" via horizon )
  2) nova/compute/api snapshot_volume_backed() calls quiesce
  3) "qemu-ga: info: guest-fsfreeze called" is seen in instance
  4) cinder fails snapshot of volume due to OverLimit
  5) cinder raises OverLimit
  6) snapshot_volume_backed() never finishes due to OverLimit
  7) filesystem is never thawed
  8) instance unusable

  I am in the process of writing and testing a patch and will have a
  review for it soon.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1731986/+subscriptions


References