yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #64077
[Bug 1691602] [NEW] live migration generates several network-changed events which lock up refreshing the nw info cache
Public bug reported:
Chris Friesen has reported that in Newton with a live migration that has
~16 ports per instance, the "network-changed" events generated from
neutron when the vifs are unplugged from the source host can effectively
block the network info cache refresh that's called at the end of the
live migration operation. Details are in the IRC logs:
http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-
nova.2017-05-17.log.html#t2017-05-17T22:50:31
But this stands out:
cfriesen mriedem: so it looks like _build_network_info_model()
costs about 200ms plus about 125ms per port since we query each port
separatly from neutron. and the refresh_cache lock is held the whole
time
In Nova the 'network-changed' event is handled generically because there
is no port id in the event, so nova just refreshes the entire nw info
cache on the instance - which can be expensive and redundant since it's
doing a lot of queries to Neutron to build up information about ports,
fixed IPs, floating IPs, subnets and networks, and Neutron doesn't have
bulk query APIs or allow OR filters in the API for bulk queries on
things like floating IPs.
https://github.com/openstack/nova/blob/8d492c76d53f3fcfacdd945a277446bdfe6797b0/nova/compute/manager.py#L6854
Looking in neutron's code that sends the network-changed event, there is
a port in scope, it's just not sent like for network-vif-deleted events.
We should be able to scope the network-changed event to a specific port
on the neutron side and check for that on the nova side so we don't have
to refresh the entire network info cache, but just the vif that was
updated.
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1691602
Title:
live migration generates several network-changed events which lock up
refreshing the nw info cache
Status in OpenStack Compute (nova):
New
Bug description:
Chris Friesen has reported that in Newton with a live migration that
has ~16 ports per instance, the "network-changed" events generated
from neutron when the vifs are unplugged from the source host can
effectively block the network info cache refresh that's called at the
end of the live migration operation. Details are in the IRC logs:
http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-
nova.2017-05-17.log.html#t2017-05-17T22:50:31
But this stands out:
cfriesen mriedem: so it looks like _build_network_info_model()
costs about 200ms plus about 125ms per port since we query each port
separatly from neutron. and the refresh_cache lock is held the whole
time
In Nova the 'network-changed' event is handled generically because
there is no port id in the event, so nova just refreshes the entire nw
info cache on the instance - which can be expensive and redundant
since it's doing a lot of queries to Neutron to build up information
about ports, fixed IPs, floating IPs, subnets and networks, and
Neutron doesn't have bulk query APIs or allow OR filters in the API
for bulk queries on things like floating IPs.
https://github.com/openstack/nova/blob/8d492c76d53f3fcfacdd945a277446bdfe6797b0/nova/compute/manager.py#L6854
Looking in neutron's code that sends the network-changed event, there
is a port in scope, it's just not sent like for network-vif-deleted
events.
We should be able to scope the network-changed event to a specific
port on the neutron side and check for that on the nova side so we
don't have to refresh the entire network info cache, but just the vif
that was updated.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1691602/+subscriptions
Follow ups