yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #71943
[Bug 1597596] Re: network not always cleaned up when spawning VMs
Reviewed: https://review.openstack.org/520248
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3a503a8f2b934f19049531c5c92130ca7cdd6a7f
Submitter: Zuul
Branch: master
commit 3a503a8f2b934f19049531c5c92130ca7cdd6a7f
Author: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date: Wed Nov 15 19:15:44 2017 -0500
Always deallocate networking before reschedule if using Neutron
When a server build fails on a selected compute host, the compute
service will cast to conductor which calls the scheduler to select
another host to attempt the build if retries are not exhausted.
With commit 08d24b733ee9f4da44bfbb8d6d3914924a41ccdc, if retries
are exhausted or the scheduler raises NoValidHost, conductor will
deallocate networking for the instance. In the case of neutron, this
means unbinding any ports that the user provided with the server
create request and deleting any ports that nova-compute created during
the allocate_for_instance() operation during server build.
When an instance is deleted, it's networking is deallocated in the same
way - unbind pre-existing ports, delete ports that nova created.
The problem is when rescheduling from a failed host, if we successfully
reschedule and build on a secondary host, any ports created from the
original host are not cleaned up until the instance is deleted. For
Ironic or SR-IOV ports, those are always deallocated.
The ComputeDriver.deallocate_networks_on_reschedule() method defaults
to False just so that the Ironic driver could override it, but really
we should always cleanup neutron ports before rescheduling.
Looking over bug report history, there are some mentions of different
networking backends handling reschedules with multiple ports differently,
in that sometimes it works and sometimes it fails. Regardless of the
networking backend, however, we are at worst taking up port quota for
the tenant for ports that will not be bound to whatever host the instance
ends up on.
There could also be legacy reasons for this behavior with nova-network,
so that is side-stepped here by just restricting this check to whether
or not neutron is being used. When we eventually remove nova-network we
can then also remove the deallocate_networks_on_reschedule() method and
SR-IOV check.
Change-Id: Ib2abf73166598ff14fce4e935efe15eeea0d4f7d
Closes-Bug: #1597596
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1597596
Title:
network not always cleaned up when spawning VMs
Status in OpenStack Compute (nova):
Fix Released
Status in OpenStack Compute (nova) ocata series:
Confirmed
Status in OpenStack Compute (nova) pike series:
Confirmed
Status in OpenStack Compute (nova) queens series:
In Progress
Bug description:
Here are the scenario:
1). Nova scheduler/conductor selects a nova-compute A to spin a VM
2). Nova compute A tries to spin the VM, but the process failed, and generates a RE-SCHEDULE exception.
3). in re-schedule exception, only when retry is none, network resource is properly cleaned up. when retry is not none, the network is not cleaned up, the port information still stays with the VM.
4). Nova condutor was notified about the failure. It selects nova-compute-B to spin VM.
5). nova compute B spins up VM successfully. However, from the instance_info_cache, the network_info showed two ports allocated for VM, one from the origin network A that associated with nova-compute A nad one from network B that associated with nova compute B.
To simulate the case, raise a fake exception in
_do_build_and_run_instance in nova-compute A:
diff --git a/nova/compute/manager.py b/nova/compute/manager.py
index ac6d92c..8ce8409 100644
--- a/nova/compute/manager.py
+++ b/nova/compute/manager.py
@@ -1746,6 +1746,7 @@ class ComputeManager(manager.Manager):
filter_properties)
LOG.info(_LI('Took %0.2f seconds to build instance.'),
timer.elapsed(), instance=instance)
+ raise exception.RescheduledException( instance_uuid=instance.uuid, reason="simulated-fault")
return build_results.ACTIVE
except exception.RescheduledException as e:
retry = filter_properties.get('retry')
environments:
*) nova master branch
*) ubuntu 12.04
*) kvm
*) bridged network.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1597596/+subscriptions
References