yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #14888
[Bug 1324703] [NEW] Default NORMAL flows on OVS bridges at boot has potential to cause network storm
Public bug reported:
There have been many times where we have restarted an environment only
to find that it becomes laggy or inaccessible within minutes of coming
back up. Unfortunately, the response of the systems through DRAC did not
lend themselves to packet captures. The usual resolution is to shut the
physical switchports, reboot the system, and turn the ports back up when
the system is online. Switchport statistics, however, show a high number
of packets in/out of the interfaces.
>From what I can tell, when the openvswitch service is started at boot,
every bridge is populated with a NORMAL learning flow. This includes the
provider, integration and tunnel bridges. If/when there is a delay in
the openvswitch plugin agent populating the flows, there is a potential
for traffic coming in one port to be forwarded out all other
bridges/ports. If there are multiple servers in this state, a traffic
storm is generated on the network due to a bridging loop.
This issue could also be seen after a restart/upgrade of openvswitch
prior to Kyle's patch for the following bug:
"neutron-openvswitch-agent does not recreate flows after ovsdb-server restarts"
https://bugs.launchpad.net/tripleo/+bug/1290486
In the above case, we could see physical vlan traffic on the provider
bridge being forwarded through the overlay networks, and vice-versa.
Rather that default to NORMAL flows for each bridge upon the starting of
openvswitch, would it be possible to default to DROP flows? That way, if
there are any sort of timing issues between openvswitch starting and the
flows actually getting written, traffic will not be allowed between
bridges.
** Affects: neutron
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1324703
Title:
Default NORMAL flows on OVS bridges at boot has potential to cause
network storm
Status in OpenStack Neutron (virtual network service):
New
Bug description:
There have been many times where we have restarted an environment only
to find that it becomes laggy or inaccessible within minutes of coming
back up. Unfortunately, the response of the systems through DRAC did
not lend themselves to packet captures. The usual resolution is to
shut the physical switchports, reboot the system, and turn the ports
back up when the system is online. Switchport statistics, however,
show a high number of packets in/out of the interfaces.
From what I can tell, when the openvswitch service is started at boot,
every bridge is populated with a NORMAL learning flow. This includes
the provider, integration and tunnel bridges. If/when there is a delay
in the openvswitch plugin agent populating the flows, there is a
potential for traffic coming in one port to be forwarded out all other
bridges/ports. If there are multiple servers in this state, a traffic
storm is generated on the network due to a bridging loop.
This issue could also be seen after a restart/upgrade of openvswitch
prior to Kyle's patch for the following bug:
"neutron-openvswitch-agent does not recreate flows after ovsdb-server restarts"
https://bugs.launchpad.net/tripleo/+bug/1290486
In the above case, we could see physical vlan traffic on the provider
bridge being forwarded through the overlay networks, and vice-versa.
Rather that default to NORMAL flows for each bridge upon the starting
of openvswitch, would it be possible to default to DROP flows? That
way, if there are any sort of timing issues between openvswitch
starting and the flows actually getting written, traffic will not be
allowed between bridges.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1324703/+subscriptions
Follow ups
References