yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #71235
[Bug 1731986] Re: nova snapshot_volume_backed failure does not thaw filesystems
Reviewed: https://review.openstack.org/519464
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=bca425a33f52584051348a3ace832be8151299a7
Submitter: Zuul
Branch: master
commit bca425a33f52584051348a3ace832be8151299a7
Author: Eric M Gonzalez <eric@xxxxxxxxx>
Date: Mon Nov 13 14:02:27 2017 -0600
unquiesce instance on volume snapshot failure
This patch adds an exception catch to "snapshot_volume_backed()" of
compute/api.py that catches (at the moment) _all_ exceptions from the
underlying cinderclient. Previously, if the instance is quiesced ( frozen
filesystem ) then the exception will break execution of the function,
skipping the needed unquiesce, and leave the instance in a frozen state.
Now, the exception catch will unquiesce the instance if it was prior to
the failure.
Got a unit test in place with the help of Matt Riedemann.
test_snapshot_volume_backed_with_quiesce_create_snap_fails
Change-Id: I60de179c72eede6746696f29462ee9d805dace47
Closes-bug: #1731986
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1731986
Title:
nova snapshot_volume_backed failure does not thaw filesystems
Status in OpenStack Compute (nova):
Fix Released
Status in OpenStack Compute (nova) ocata series:
Confirmed
Status in OpenStack Compute (nova) pike series:
In Progress
Status in OpenStack Compute (nova) queens series:
In Progress
Bug description:
Noticed in OpenStack Mitaka (commit 9825c80), but the function
(snapshot_volume_backed) is unchanged as of commit a4fc1bcd. backends:
Libvirt + Ceph.
When Nova attempts to create an image / snapshot of a volume-backed
instance it first quiesces the instance in `snapshot_volume_backed()`.
It then loops over all of the block devices associated with that
instance. However, there is no exception handling in the for loop and
any failures on the part of Cinder are bubbled up and through the
`snapshot_volume_backed()` function. This causes the needed
`unquiesce()` to never be called on the instance, leaving it in an
inconsistent (read-only) state. This can cause operational errors in
the instance leaving it unusable.
In my case, the steps for reproduction are:
1) nova create image / ( "create snapshot" via horizon )
2) nova/compute/api snapshot_volume_backed() calls quiesce
3) "qemu-ga: info: guest-fsfreeze called" is seen in instance
4) cinder fails snapshot of volume due to OverLimit
5) cinder raises OverLimit
6) snapshot_volume_backed() never finishes due to OverLimit
7) filesystem is never thawed
8) instance unusable
I am in the process of writing and testing a patch and will have a
review for it soon.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1731986/+subscriptions
References