yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #77241
[Bug 1818015] [NEW] VLAN manager removed external port mapping when it was still in use
Public bug reported:
A production Queens DVR deployment (12.0.3-0ubuntu1~cloud0) erroneously
cleaned up the VLAN/binding for an external network (used by multiple
ports, generally for routers) that was still in use. This occurred on
all hyper-visors at around the same time.
2019-02-07 03:56:58.273 14197 INFO
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-
71ccf801-d722-4196-a1d7-4924953939d8 - - - - -] Reclaiming vlan = 10
from net-id = fa2c3b23-5f25-4ab1-b06b-6edc405ec323
This broke traffic flow for the remaining router using this port. After
restarting neutron-openvswitch-agent it claimed the port was updated,
and then re-added the mapping and traffic flowed again.
Unfortunately I don't have good details on what caused this situation to
occur, and do not have a reproduction case. My hope is to analyse the
theoretical situation for what may have led to this.
This is a "reasonable" size cloud with 10 compute hosts, 100s of
instances, 56 routers.
A few details that I do have:
- It seems that multiple neutron ports were being deleted at the time across the cloud. The one event I can notice from the hypervisor's auth.log is that a floating IP on that same network was removed within the minute prior. I am not really sure if that was itself specifically related. Unfortunately I do not have the corresponding neutron-api logs from that same time period.
My hope is to analyse the theoretical situation for how it may occur
that the vlan manager loses track of multiple users of the port. In such
a way that also caused that to happen consistently across all HVs.
** Affects: neutron
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1818015
Title:
VLAN manager removed external port mapping when it was still in use
Status in neutron:
New
Bug description:
A production Queens DVR deployment (12.0.3-0ubuntu1~cloud0)
erroneously cleaned up the VLAN/binding for an external network (used
by multiple ports, generally for routers) that was still in use. This
occurred on all hyper-visors at around the same time.
2019-02-07 03:56:58.273 14197 INFO
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-
71ccf801-d722-4196-a1d7-4924953939d8 - - - - -] Reclaiming vlan = 10
from net-id = fa2c3b23-5f25-4ab1-b06b-6edc405ec323
This broke traffic flow for the remaining router using this port.
After restarting neutron-openvswitch-agent it claimed the port was
updated, and then re-added the mapping and traffic flowed again.
Unfortunately I don't have good details on what caused this situation
to occur, and do not have a reproduction case. My hope is to analyse
the theoretical situation for what may have led to this.
This is a "reasonable" size cloud with 10 compute hosts, 100s of
instances, 56 routers.
A few details that I do have:
- It seems that multiple neutron ports were being deleted at the time across the cloud. The one event I can notice from the hypervisor's auth.log is that a floating IP on that same network was removed within the minute prior. I am not really sure if that was itself specifically related. Unfortunately I do not have the corresponding neutron-api logs from that same time period.
My hope is to analyse the theoretical situation for how it may occur
that the vlan manager loses track of multiple users of the port. In
such a way that also caused that to happen consistently across all
HVs.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1818015/+subscriptions
Follow ups