← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1324703] [NEW] Default NORMAL flows on OVS bridges at boot has potential to cause network storm

 

Public bug reported:

There have been many times where we have restarted an environment only
to find that it becomes laggy or inaccessible within minutes of coming
back up. Unfortunately, the response of the systems through DRAC did not
lend themselves to packet captures. The usual resolution is to shut the
physical switchports, reboot the system, and turn the ports back up when
the system is online. Switchport statistics, however, show a high number
of packets in/out of the interfaces.

>From what I can tell, when the openvswitch service is started at boot,
every bridge is populated with a NORMAL learning flow. This includes the
provider, integration and tunnel bridges. If/when there is a delay in
the openvswitch plugin agent populating the flows, there is a potential
for traffic coming in one port to be forwarded out all other
bridges/ports. If there are multiple servers in this state, a traffic
storm is generated on the network due to a bridging loop.

This issue could also be seen after a restart/upgrade of openvswitch
prior to Kyle's patch for the following bug:

"neutron-openvswitch-agent does not recreate flows after ovsdb-server restarts"
https://bugs.launchpad.net/tripleo/+bug/1290486

In the above case, we could see physical vlan traffic on the provider
bridge being forwarded through the overlay networks, and vice-versa.
Rather that default to NORMAL flows for each bridge upon the starting of
openvswitch, would it be possible to default to DROP flows? That way, if
there are any sort of timing issues between openvswitch starting and the
flows actually getting written, traffic will not be allowed between
bridges.

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1324703

Title:
  Default NORMAL flows on OVS bridges at boot has potential to cause
  network storm

Status in OpenStack Neutron (virtual network service):
  New

Bug description:
  There have been many times where we have restarted an environment only
  to find that it becomes laggy or inaccessible within minutes of coming
  back up. Unfortunately, the response of the systems through DRAC did
  not lend themselves to packet captures. The usual resolution is to
  shut the physical switchports, reboot the system, and turn the ports
  back up when the system is online. Switchport statistics, however,
  show a high number of packets in/out of the interfaces.

  From what I can tell, when the openvswitch service is started at boot,
  every bridge is populated with a NORMAL learning flow. This includes
  the provider, integration and tunnel bridges. If/when there is a delay
  in the openvswitch plugin agent populating the flows, there is a
  potential for traffic coming in one port to be forwarded out all other
  bridges/ports. If there are multiple servers in this state, a traffic
  storm is generated on the network due to a bridging loop.

  This issue could also be seen after a restart/upgrade of openvswitch
  prior to Kyle's patch for the following bug:

  "neutron-openvswitch-agent does not recreate flows after ovsdb-server restarts"
  https://bugs.launchpad.net/tripleo/+bug/1290486

  In the above case, we could see physical vlan traffic on the provider
  bridge being forwarded through the overlay networks, and vice-versa.
  Rather that default to NORMAL flows for each bridge upon the starting
  of openvswitch, would it be possible to default to DROP flows? That
  way, if there are any sort of timing issues between openvswitch
  starting and the flows actually getting written, traffic will not be
  allowed between bridges.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1324703/+subscriptions


Follow ups

References