← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1767363] [NEW] Deleting 2 instances with a common multi-attached volume can leave the volume attached

 

Public bug reported:

CAVEAT: The following is only from code inspection. I have not
reproduced the issue.

During instance delete, we call:

  driver.cleanup():
    foreach volume:
      _disconnect_volume():
        if _should_disconnect_target():
          disconnect_volume()

There is no volume-specific or global locking around _disconnect_volume
that I can see in this call graph.

_should_disconnect_target() is intended to check for multi-attached
volumes on a single host, to prevent a volume being disconnected while
it is still in use by another instance. It does:

  volume = cinder->get_volume()
  connection_count = count of volume.attachments where instance is on this host

As there is no locking between the above operation and the subsequent
disconnect_volume(), 2 simultaneous calls to _disconnect_volume() can
both return False from _should_disconnect_target(). Not only this, but
as this involves both a slow call out to cinder and a db lookup, this is
likely to be easily hit in practice for example by an orchestration tool
mass-deleting instances.

Also note that there are many call paths which call _disconnect_volume()
apart from cleanup(), so there are likely numerous other potential
interactions here.

The result would be that all attachments are deleted, but the volume
remains attached to the host.

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1767363

Title:
  Deleting 2 instances with a common multi-attached volume can leave the
  volume attached

Status in OpenStack Compute (nova):
  New

Bug description:
  CAVEAT: The following is only from code inspection. I have not
  reproduced the issue.

  During instance delete, we call:

    driver.cleanup():
      foreach volume:
        _disconnect_volume():
          if _should_disconnect_target():
            disconnect_volume()

  There is no volume-specific or global locking around
  _disconnect_volume that I can see in this call graph.

  _should_disconnect_target() is intended to check for multi-attached
  volumes on a single host, to prevent a volume being disconnected while
  it is still in use by another instance. It does:

    volume = cinder->get_volume()
    connection_count = count of volume.attachments where instance is on this host

  As there is no locking between the above operation and the subsequent
  disconnect_volume(), 2 simultaneous calls to _disconnect_volume() can
  both return False from _should_disconnect_target(). Not only this, but
  as this involves both a slow call out to cinder and a db lookup, this
  is likely to be easily hit in practice for example by an orchestration
  tool mass-deleting instances.

  Also note that there are many call paths which call
  _disconnect_volume() apart from cleanup(), so there are likely
  numerous other potential interactions here.

  The result would be that all attachments are deleted, but the volume
  remains attached to the host.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1767363/+subscriptions