yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #74117
[Bug 1527623] Re: Nova might orphan volumes when it's racing to delete a volume-backed instance
See https://review.openstack.org/#/c/565601/5 for more context - that
was changed because it failed the ceph job, because apparently with rbd
volumes you can't delete the volume snapshots until the original volume
is deleted, which in the cinder API you normally can't do that if there
are snapshots, so it's a weird catch-22.
** Changed in: nova
Status: In Progress => Invalid
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1527623
Title:
Nova might orphan volumes when it's racing to delete a volume-backed
instance
Status in OpenStack Compute (nova):
Invalid
Bug description:
Discussed in the -dev mailing list here:
http://lists.openstack.org/pipermail/openstack-
dev/2015-December/082596.html
When nova deletes a volume-backed instance, it detaches the volume
first here:
https://github.com/openstack/nova/blob/5508e11cf873384a28dc7416168d34e85f2c06cf/nova/compute/manager.py#L2293
And then deletes the volume here (if the delete_on_termination flag
was set to True):
https://github.com/openstack/nova/blob/5508e11cf873384a28dc7416168d34e85f2c06cf/nova/compute/manager.py#L2320
The problem is this code races since the detach is async, nova gets
back a 202 and then goes on to delete the volume, which can fail if
the volume status is not 'available' yet, as seen here:
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message:%5C%22Failed%20to%20delete%20volume%5C%22%20AND%20message:%5C%22due%20to%5C%22%20AND%20tags:%5C%22screen-n-cpu.txt%5C%22
http://logs.openstack.org/36/231936/9/check/gate-tempest-dsvm-full-
lio/31de861/logs/screen-n-cpu.txt.gz?level=TRACE#_2015-12-18_13_59_16_071
2015-12-18 13:59:16.071 WARNING nova.compute.manager [req-22431c70
-78da-4fea-b132-170d27177a6f tempest-TestVolumeBootPattern-196984582
tempest-TestVolumeBootPattern-290257504] Failed to delete volume:
16f9252c-4036-463b-a053-60d4f46796c1 due to Invalid input received:
Invalid volume: Volume status must be available or error or
error_restoring or error_extending and must not be migrating,
attached, belong to a consistency group or have snapshots. (HTTP 400)
(Request-ID: req-260c7d2a-d0aa-4ee1-b5a0-9b0c45f1d695)
This isn't an error in nova because the compute manager's
_delete_instance method calls _cleanup_volumes with raise_exc=False,
but this will orphan volumes in cinder, which then requires manual
cleanup on the cinder side.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1527623/+subscriptions
References