← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1830081] Re: Nova unplug interface race condition when deleting an instance

 

Reviewed:  https://review.opendev.org/660761
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d4ed0d8b7adc350e8962df033c2da892c95561fe
Submitter: Zuul
Branch:    master

commit d4ed0d8b7adc350e8962df033c2da892c95561fe
Author: Arnaud Morin <arnaud.morin@xxxxxxxxxxxx>
Date:   Wed May 22 17:34:20 2019 +0200

    Refresh instance network info on deletion
    
    When deleting an instance, if the network info is empty, we should
    refresh the info because we can't be sure the copy of the cache we
    have when we fetched the instance to delete is up-to-date, i.e. if
    we're racing to delete the server while it's building and the
    network info cache was updated in the database after we started the
    delete operation and got the instance from the DB, then we could
    fail to unplug VIFs.
    
    Closes-Bug: #1830081
    
    Change-Id: I99601773406c61f93002e2f7cbb248cf73cef0ab
    Signed-off-by: Arnaud Morin <arnaud.morin@xxxxxxxxxxxx>


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1830081

Title:
  Nova unplug interface race condition when deleting an instance

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  Confirmed
Status in OpenStack Compute (nova) rocky series:
  Confirmed
Status in OpenStack Compute (nova) stein series:
  Confirmed

Bug description:
  Description
  ===========
  When nova start an instance, it asks neutron to create a port and then update the instance info cache based on information from neutron.
  If, in the middle of the spawning, the instance is getting deleted, the terminate_instance function is called with an instance object that DOES NOT contain any network info.
  As a result, nova is deleting the instance but is never unplugging the interface.

  Step to reproduce
  =================
  I am booting an instance and immediately deleting it thanks to a command like:
  $ openstack server create --key-name fake --image ubuntu1810 --flavor c2-7 --net Ext-Net arnaudubuntu1810-3 ; nova delete arnaudubuntu1810-3

  
  - [1] build_and_run_instance is executed, with a semaphore, thus, locking the instance. When booting, nova will fill the network_info cache, by calling [2] update_instance_cache_with_nw_info.
  - [3] terminate_instance is executed few seconds later, but is waiting for the semaphore to be released. At this time, the instance network_info cache may not be filled, depending if the [2] update_instance_cache_with_nw_info has already been executed or not.
  - If we follow the code, we end up at _shutdown_instance [4], which is doing a call to [5] get_network_info, which is returning a NetworkInfo object that contains no interface.
  - At the end, nova is calling _unplug_vifs [6] which is doing nothing (no vif)


  Note that I am running OpenStack Newton release, but the code involved
  seems identical on master.


  [1] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1837
  [2] https://github.com/openstack/nova/blob/master/nova/network/base_api.py#L34
  [2] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2765
  [4] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2559
  [5] https://github.com/openstack/nova/blob/master/nova/objects/instance.py#L1252
  [6] https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L919

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1830081/+subscriptions


References