← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1830081] Re: Nova unplug interface race condition when deleting an instance

 

** Also affects: nova/queens
   Importance: Undecided
       Status: New

** Also affects: nova/rocky
   Importance: Undecided
       Status: New

** Also affects: nova/stein
   Importance: Undecided
       Status: New

** Changed in: nova/queens
       Status: New => Confirmed

** Changed in: nova/rocky
       Status: New => Confirmed

** Changed in: nova/stein
       Status: New => Confirmed

** Changed in: nova/queens
   Importance: Undecided => Low

** Changed in: nova/rocky
   Importance: Undecided => Low

** Changed in: nova/stein
   Importance: Undecided => Low

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1830081

Title:
  Nova unplug interface race condition when deleting an instance

Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) queens series:
  Confirmed
Status in OpenStack Compute (nova) rocky series:
  Confirmed
Status in OpenStack Compute (nova) stein series:
  Confirmed

Bug description:
  Description
  ===========
  When nova start an instance, it asks neutron to create a port and then update the instance info cache based on information from neutron.
  If, in the middle of the spawning, the instance is getting deleted, the terminate_instance function is called with an instance object that DOES NOT contain any network info.
  As a result, nova is deleting the instance but is never unplugging the interface.

  Step to reproduce
  =================
  I am booting an instance and immediately deleting it thanks to a command like:
  $ openstack server create --key-name fake --image ubuntu1810 --flavor c2-7 --net Ext-Net arnaudubuntu1810-3 ; nova delete arnaudubuntu1810-3

  
  - [1] build_and_run_instance is executed, with a semaphore, thus, locking the instance. When booting, nova will fill the network_info cache, by calling [2] update_instance_cache_with_nw_info.
  - [3] terminate_instance is executed few seconds later, but is waiting for the semaphore to be released. At this time, the instance network_info cache may not be filled, depending if the [2] update_instance_cache_with_nw_info has already been executed or not.
  - If we follow the code, we end up at _shutdown_instance [4], which is doing a call to [5] get_network_info, which is returning a NetworkInfo object that contains no interface.
  - At the end, nova is calling _unplug_vifs [6] which is doing nothing (no vif)


  Note that I am running OpenStack Newton release, but the code involved
  seems identical on master.


  [1] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1837
  [2] https://github.com/openstack/nova/blob/master/nova/network/base_api.py#L34
  [2] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2765
  [4] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2559
  [5] https://github.com/openstack/nova/blob/master/nova/objects/instance.py#L1252
  [6] https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L919

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1830081/+subscriptions


References