yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #43562
[Bug 1527623] [NEW] Nova might orphan volumes when it's racing to delete a volume-backed instance
Public bug reported:
Discussed in the -dev mailing list here:
http://lists.openstack.org/pipermail/openstack-
dev/2015-December/082596.html
When nova deletes a volume-backed instance, it detaches the volume first
here:
https://github.com/openstack/nova/blob/5508e11cf873384a28dc7416168d34e85f2c06cf/nova/compute/manager.py#L2293
And then deletes the volume here (if the delete_on_termination flag was
set to True):
https://github.com/openstack/nova/blob/5508e11cf873384a28dc7416168d34e85f2c06cf/nova/compute/manager.py#L2320
The problem is this code races since the detach is async, nova gets back
a 202 and then goes on to delete the volume, which can fail if the
volume status is not 'available' yet, as seen here:
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message:%5C%22Failed%20to%20delete%20volume%5C%22%20AND%20message:%5C%22due%20to%5C%22%20AND%20tags:%5C%22screen-n-cpu.txt%5C%22
http://logs.openstack.org/36/231936/9/check/gate-tempest-dsvm-full-
lio/31de861/logs/screen-n-cpu.txt.gz?level=TRACE#_2015-12-18_13_59_16_071
2015-12-18 13:59:16.071 WARNING nova.compute.manager [req-22431c70-78da-
4fea-b132-170d27177a6f tempest-TestVolumeBootPattern-196984582 tempest-
TestVolumeBootPattern-290257504] Failed to delete volume:
16f9252c-4036-463b-a053-60d4f46796c1 due to Invalid input received:
Invalid volume: Volume status must be available or error or
error_restoring or error_extending and must not be migrating, attached,
belong to a consistency group or have snapshots. (HTTP 400) (Request-ID:
req-260c7d2a-d0aa-4ee1-b5a0-9b0c45f1d695)
This isn't an error in nova because the compute manager's
_delete_instance method calls _cleanup_volumes with raise_exc=False, but
this will orphan volumes in cinder, which then requires manual cleanup
on the cinder side.
** Affects: nova
Importance: Medium
Status: Triaged
** Tags: compute kilo-backport-potential liberty-backport-potential volumes
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1527623
Title:
Nova might orphan volumes when it's racing to delete a volume-backed
instance
Status in OpenStack Compute (nova):
Triaged
Bug description:
Discussed in the -dev mailing list here:
http://lists.openstack.org/pipermail/openstack-
dev/2015-December/082596.html
When nova deletes a volume-backed instance, it detaches the volume
first here:
https://github.com/openstack/nova/blob/5508e11cf873384a28dc7416168d34e85f2c06cf/nova/compute/manager.py#L2293
And then deletes the volume here (if the delete_on_termination flag
was set to True):
https://github.com/openstack/nova/blob/5508e11cf873384a28dc7416168d34e85f2c06cf/nova/compute/manager.py#L2320
The problem is this code races since the detach is async, nova gets
back a 202 and then goes on to delete the volume, which can fail if
the volume status is not 'available' yet, as seen here:
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message:%5C%22Failed%20to%20delete%20volume%5C%22%20AND%20message:%5C%22due%20to%5C%22%20AND%20tags:%5C%22screen-n-cpu.txt%5C%22
http://logs.openstack.org/36/231936/9/check/gate-tempest-dsvm-full-
lio/31de861/logs/screen-n-cpu.txt.gz?level=TRACE#_2015-12-18_13_59_16_071
2015-12-18 13:59:16.071 WARNING nova.compute.manager [req-22431c70
-78da-4fea-b132-170d27177a6f tempest-TestVolumeBootPattern-196984582
tempest-TestVolumeBootPattern-290257504] Failed to delete volume:
16f9252c-4036-463b-a053-60d4f46796c1 due to Invalid input received:
Invalid volume: Volume status must be available or error or
error_restoring or error_extending and must not be migrating,
attached, belong to a consistency group or have snapshots. (HTTP 400)
(Request-ID: req-260c7d2a-d0aa-4ee1-b5a0-9b0c45f1d695)
This isn't an error in nova because the compute manager's
_delete_instance method calls _cleanup_volumes with raise_exc=False,
but this will orphan volumes in cinder, which then requires manual
cleanup on the cinder side.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1527623/+subscriptions
Follow ups