yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #78856
[Bug 1830081] Re: Nova unplug interface race condition when deleting an instance
Reviewed: https://review.opendev.org/660761
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d4ed0d8b7adc350e8962df033c2da892c95561fe
Submitter: Zuul
Branch: master
commit d4ed0d8b7adc350e8962df033c2da892c95561fe
Author: Arnaud Morin <arnaud.morin@xxxxxxxxxxxx>
Date: Wed May 22 17:34:20 2019 +0200
Refresh instance network info on deletion
When deleting an instance, if the network info is empty, we should
refresh the info because we can't be sure the copy of the cache we
have when we fetched the instance to delete is up-to-date, i.e. if
we're racing to delete the server while it's building and the
network info cache was updated in the database after we started the
delete operation and got the instance from the DB, then we could
fail to unplug VIFs.
Closes-Bug: #1830081
Change-Id: I99601773406c61f93002e2f7cbb248cf73cef0ab
Signed-off-by: Arnaud Morin <arnaud.morin@xxxxxxxxxxxx>
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1830081
Title:
Nova unplug interface race condition when deleting an instance
Status in OpenStack Compute (nova):
Fix Released
Status in OpenStack Compute (nova) queens series:
Confirmed
Status in OpenStack Compute (nova) rocky series:
Confirmed
Status in OpenStack Compute (nova) stein series:
Confirmed
Bug description:
Description
===========
When nova start an instance, it asks neutron to create a port and then update the instance info cache based on information from neutron.
If, in the middle of the spawning, the instance is getting deleted, the terminate_instance function is called with an instance object that DOES NOT contain any network info.
As a result, nova is deleting the instance but is never unplugging the interface.
Step to reproduce
=================
I am booting an instance and immediately deleting it thanks to a command like:
$ openstack server create --key-name fake --image ubuntu1810 --flavor c2-7 --net Ext-Net arnaudubuntu1810-3 ; nova delete arnaudubuntu1810-3
- [1] build_and_run_instance is executed, with a semaphore, thus, locking the instance. When booting, nova will fill the network_info cache, by calling [2] update_instance_cache_with_nw_info.
- [3] terminate_instance is executed few seconds later, but is waiting for the semaphore to be released. At this time, the instance network_info cache may not be filled, depending if the [2] update_instance_cache_with_nw_info has already been executed or not.
- If we follow the code, we end up at _shutdown_instance [4], which is doing a call to [5] get_network_info, which is returning a NetworkInfo object that contains no interface.
- At the end, nova is calling _unplug_vifs [6] which is doing nothing (no vif)
Note that I am running OpenStack Newton release, but the code involved
seems identical on master.
[1] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1837
[2] https://github.com/openstack/nova/blob/master/nova/network/base_api.py#L34
[2] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2765
[4] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2559
[5] https://github.com/openstack/nova/blob/master/nova/objects/instance.py#L1252
[6] https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L919
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1830081/+subscriptions
References