← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1423845] [NEW] In certain cases compute does not clean up neutron ports after unsuccessful vm spawn

 

Public bug reported:

When allocating networks for instance compute first creates ports and then fetches them from neutron to build network info.
Under high load it might be possible that neutron/keystone timeouts on a request to fetch ports for instance (traceback attached).
In this case exception is caught  and _shutdown_instance() with try_deallocate_networks=False is called with the assumption that "Network deallocation is already handled in this code path so it should not happen in _shutdown_instance." [1]
Then the exception is reraised, caught in _build_and_run_instance() and reraised as RescheduledException [2].
RescheduledException is caught in _do_build_and_run_instance [3]
Eventually only self.network_api.cleanup_instance_network_on_host() is called and instance resheduling initiated.
self.network_api.cleanup_instance_network_on_host() does nothing in case of neutron so we have orphaned ports.

I see two possible fixes: either do network deallocation on
_shutdown_instance() or implement cleanup_instance_network_on_host() to
do ports cleanup.

[1] bug 1332198 commit 5120c4f7c2670eaa71898fe6941029bbb0081949
[2] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2233
[3] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2089
[4] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2113

** Affects: nova
     Importance: Undecided
     Assignee: Oleg Bondarev (obondarev)
         Status: New


** Tags: network

** Attachment added: "traceback.txt"
   https://bugs.launchpad.net/bugs/1423845/+attachment/4323154/+files/traceback.txt

** Tags added: network

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1423845

Title:
  In certain cases compute does not clean up neutron ports after
  unsuccessful vm spawn

Status in OpenStack Compute (Nova):
  New

Bug description:
  When allocating networks for instance compute first creates ports and then fetches them from neutron to build network info.
  Under high load it might be possible that neutron/keystone timeouts on a request to fetch ports for instance (traceback attached).
  In this case exception is caught  and _shutdown_instance() with try_deallocate_networks=False is called with the assumption that "Network deallocation is already handled in this code path so it should not happen in _shutdown_instance." [1]
  Then the exception is reraised, caught in _build_and_run_instance() and reraised as RescheduledException [2].
  RescheduledException is caught in _do_build_and_run_instance [3]
  Eventually only self.network_api.cleanup_instance_network_on_host() is called and instance resheduling initiated.
  self.network_api.cleanup_instance_network_on_host() does nothing in case of neutron so we have orphaned ports.

  I see two possible fixes: either do network deallocation on
  _shutdown_instance() or implement cleanup_instance_network_on_host()
  to do ports cleanup.

  [1] bug 1332198 commit 5120c4f7c2670eaa71898fe6941029bbb0081949
  [2] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2233
  [3] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2089
  [4] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2113

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1423845/+subscriptions


Follow ups

References