← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1597596] Re: network not always cleaned up when spawning VMs

 

Reviewed:  https://review.openstack.org/520248
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3a503a8f2b934f19049531c5c92130ca7cdd6a7f
Submitter: Zuul
Branch:    master

commit 3a503a8f2b934f19049531c5c92130ca7cdd6a7f
Author: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date:   Wed Nov 15 19:15:44 2017 -0500

    Always deallocate networking before reschedule if using Neutron
    
    When a server build fails on a selected compute host, the compute
    service will cast to conductor which calls the scheduler to select
    another host to attempt the build if retries are not exhausted.
    
    With commit 08d24b733ee9f4da44bfbb8d6d3914924a41ccdc, if retries
    are exhausted or the scheduler raises NoValidHost, conductor will
    deallocate networking for the instance. In the case of neutron, this
    means unbinding any ports that the user provided with the server
    create request and deleting any ports that nova-compute created during
    the allocate_for_instance() operation during server build.
    
    When an instance is deleted, it's networking is deallocated in the same
    way - unbind pre-existing ports, delete ports that nova created.
    
    The problem is when rescheduling from a failed host, if we successfully
    reschedule and build on a secondary host, any ports created from the
    original host are not cleaned up until the instance is deleted. For
    Ironic or SR-IOV ports, those are always deallocated.
    
    The ComputeDriver.deallocate_networks_on_reschedule() method defaults
    to False just so that the Ironic driver could override it, but really
    we should always cleanup neutron ports before rescheduling.
    
    Looking over bug report history, there are some mentions of different
    networking backends handling reschedules with multiple ports differently,
    in that sometimes it works and sometimes it fails. Regardless of the
    networking backend, however, we are at worst taking up port quota for
    the tenant for ports that will not be bound to whatever host the instance
    ends up on.
    
    There could also be legacy reasons for this behavior with nova-network,
    so that is side-stepped here by just restricting this check to whether
    or not neutron is being used. When we eventually remove nova-network we
    can then also remove the deallocate_networks_on_reschedule() method and
    SR-IOV check.
    
    Change-Id: Ib2abf73166598ff14fce4e935efe15eeea0d4f7d
    Closes-Bug: #1597596


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1597596

Title:
  network not always cleaned up when spawning VMs

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) ocata series:
  Confirmed
Status in OpenStack Compute (nova) pike series:
  Confirmed
Status in OpenStack Compute (nova) queens series:
  In Progress

Bug description:
  Here are the scenario:
  1). Nova scheduler/conductor selects a nova-compute A to spin a VM
  2). Nova compute A tries to spin the VM, but the process failed, and generates a RE-SCHEDULE exception.
  3). in re-schedule exception, only when retry is none, network resource is properly cleaned up. when retry is not none, the network is not cleaned up, the port information still stays with the VM.
  4). Nova condutor was notified about the failure. It selects nova-compute-B to spin VM.
  5). nova compute B spins up VM successfully. However, from the instance_info_cache, the network_info showed two ports allocated for VM, one from the origin network A that associated with nova-compute A nad one from network B that associated with nova compute B.

  To simulate the case, raise a fake exception in
  _do_build_and_run_instance in nova-compute A:

  diff --git a/nova/compute/manager.py b/nova/compute/manager.py
  index ac6d92c..8ce8409 100644
  --- a/nova/compute/manager.py
  +++ b/nova/compute/manager.py
  @@ -1746,6 +1746,7 @@ class ComputeManager(manager.Manager):
                           filter_properties)
               LOG.info(_LI('Took %0.2f seconds to build instance.'),
                        timer.elapsed(), instance=instance)
  +            raise exception.RescheduledException( instance_uuid=instance.uuid, reason="simulated-fault")
               return build_results.ACTIVE
           except exception.RescheduledException as e:
               retry = filter_properties.get('retry')

  environments: 
  *) nova master branch
  *) ubuntu 12.04
  *) kvm
  *) bridged network.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1597596/+subscriptions


References