yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #80813
[Bug 1838392] Re: BDMNotFound raised and stale block devices left over when simultaneously reboot and deleting an instance
Reviewed: https://review.opendev.org/673463
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=9ad54f3dacbd372271f441baea5380f913072dde
Submitter: Zuul
Branch: master
commit 9ad54f3dacbd372271f441baea5380f913072dde
Author: Lee Yarwood <lyarwood@xxxxxxxxxx>
Date: Mon Jul 29 16:25:45 2019 +0100
compute: Take an instance.uuid lock when rebooting
Previously simultaneous requests to reboot and delete an instance could
race as only the latter took a lock against the uuid of the instance.
With the Libvirt driver this race could potentially result in attempts
being made to reconnect previously disconnected volumes on the host.
Depending on the volume backend being used this could then result in
stale block devices point to unmapped volumes being left on the host
that in turn could cause failures later on when connecting newly mapped
volumes.
This change avoids this race by ensuring any request to reboot an
instance takes an instance.uuid lock within the compute manager,
serialising requests to reboot and then delete the instance.
Closes-Bug: #1838392
Change-Id: Ieb59de10c63bb067f92ec054535766cdd722dae2
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1838392
Title:
BDMNotFound raised and stale block devices left over when
simultaneously reboot and deleting an instance
Status in OpenStack Compute (nova):
Fix Released
Bug description:
Description
===========
Simultaneous requests to reboot and delete an instance _will_ race as only the call to delete takes a lock against the instance.uuid.
One possible outcome of this seen in the wild with the Libvirt driver
is that the request to soft reboot will eventually turn into a hard
reboot, reconnecting volumes that the delete request has already
disconnected. These volumes will eventually be unmapped on the Cinder
side by the delete request leaving stale devices on the host.
Additionally BDMNotFound is raised by the reboot operation as the
delete operation has already deleted the BDMs.
Steps to reproduce
==================
$ nova reboot $instance && nova delete $instance
Expected result
===============
The instance reboots and is then deleted without any errors raised.
Actual result
=============
BDMNotFound raised and stale block devices left over.
Environment
===========
1. Exact version of OpenStack you are running. See the following
list for all releases: http://docs.openstack.org/releases/
1599e3cf68779eafaaa2b13a273d3bebd1379c19 / 19.0.0.0rc1-992-g1599e3cf68
2. Which hypervisor did you use?
(For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
What's the version of that?
Libvirt + QEMU/kvm
2. Which storage type did you use?
(For example: Ceph, LVM, GPFS, ...)
What's the version of that?
N/A
3. Which networking type did you use?
(For example: nova-network, Neutron with OpenVSwitch, ...)
N/A
Logs & Configs
==============
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1838392/+subscriptions
References