← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1527623] Re: Nova might orphan volumes when it's racing to delete a volume-backed instance

 

See https://review.openstack.org/#/c/565601/5 for more context - that
was changed because it failed the ceph job, because apparently with rbd
volumes you can't delete the volume snapshots until the original volume
is deleted, which in the cinder API you normally can't do that if there
are snapshots, so it's a weird catch-22.

** Changed in: nova
       Status: In Progress => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1527623

Title:
  Nova might orphan volumes when it's racing to delete a volume-backed
  instance

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  Discussed in the -dev mailing list here:

  http://lists.openstack.org/pipermail/openstack-
  dev/2015-December/082596.html

  When nova deletes a volume-backed instance, it detaches the volume
  first here:

  https://github.com/openstack/nova/blob/5508e11cf873384a28dc7416168d34e85f2c06cf/nova/compute/manager.py#L2293

  And then deletes the volume here (if the delete_on_termination flag
  was set to True):

  https://github.com/openstack/nova/blob/5508e11cf873384a28dc7416168d34e85f2c06cf/nova/compute/manager.py#L2320

  The problem is this code races since the detach is async, nova gets
  back a 202 and then goes on to delete the volume, which can fail if
  the volume status is not 'available' yet, as seen here:

  http://logstash.openstack.org/#dashboard/file/logstash.json?query=message:%5C%22Failed%20to%20delete%20volume%5C%22%20AND%20message:%5C%22due%20to%5C%22%20AND%20tags:%5C%22screen-n-cpu.txt%5C%22

  http://logs.openstack.org/36/231936/9/check/gate-tempest-dsvm-full-
  lio/31de861/logs/screen-n-cpu.txt.gz?level=TRACE#_2015-12-18_13_59_16_071

  2015-12-18 13:59:16.071 WARNING nova.compute.manager [req-22431c70
  -78da-4fea-b132-170d27177a6f tempest-TestVolumeBootPattern-196984582
  tempest-TestVolumeBootPattern-290257504] Failed to delete volume:
  16f9252c-4036-463b-a053-60d4f46796c1 due to Invalid input received:
  Invalid volume: Volume status must be available or error or
  error_restoring or error_extending and  must not be migrating,
  attached, belong to a consistency group or have snapshots. (HTTP 400)
  (Request-ID: req-260c7d2a-d0aa-4ee1-b5a0-9b0c45f1d695)

  This isn't an error in nova because the compute manager's
  _delete_instance method calls _cleanup_volumes with raise_exc=False,
  but this will orphan volumes in cinder, which then requires manual
  cleanup on the cinder side.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1527623/+subscriptions


References