yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #72646
[Bug 1767363] [NEW] Deleting 2 instances with a common multi-attached volume can leave the volume attached
Public bug reported:
CAVEAT: The following is only from code inspection. I have not
reproduced the issue.
During instance delete, we call:
driver.cleanup():
foreach volume:
_disconnect_volume():
if _should_disconnect_target():
disconnect_volume()
There is no volume-specific or global locking around _disconnect_volume
that I can see in this call graph.
_should_disconnect_target() is intended to check for multi-attached
volumes on a single host, to prevent a volume being disconnected while
it is still in use by another instance. It does:
volume = cinder->get_volume()
connection_count = count of volume.attachments where instance is on this host
As there is no locking between the above operation and the subsequent
disconnect_volume(), 2 simultaneous calls to _disconnect_volume() can
both return False from _should_disconnect_target(). Not only this, but
as this involves both a slow call out to cinder and a db lookup, this is
likely to be easily hit in practice for example by an orchestration tool
mass-deleting instances.
Also note that there are many call paths which call _disconnect_volume()
apart from cleanup(), so there are likely numerous other potential
interactions here.
The result would be that all attachments are deleted, but the volume
remains attached to the host.
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1767363
Title:
Deleting 2 instances with a common multi-attached volume can leave the
volume attached
Status in OpenStack Compute (nova):
New
Bug description:
CAVEAT: The following is only from code inspection. I have not
reproduced the issue.
During instance delete, we call:
driver.cleanup():
foreach volume:
_disconnect_volume():
if _should_disconnect_target():
disconnect_volume()
There is no volume-specific or global locking around
_disconnect_volume that I can see in this call graph.
_should_disconnect_target() is intended to check for multi-attached
volumes on a single host, to prevent a volume being disconnected while
it is still in use by another instance. It does:
volume = cinder->get_volume()
connection_count = count of volume.attachments where instance is on this host
As there is no locking between the above operation and the subsequent
disconnect_volume(), 2 simultaneous calls to _disconnect_volume() can
both return False from _should_disconnect_target(). Not only this, but
as this involves both a slow call out to cinder and a db lookup, this
is likely to be easily hit in practice for example by an orchestration
tool mass-deleting instances.
Also note that there are many call paths which call
_disconnect_volume() apart from cleanup(), so there are likely
numerous other potential interactions here.
The result would be that all attachments are deleted, but the volume
remains attached to the host.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1767363/+subscriptions