← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1607787] [NEW] Missing secure fail mode on physical bridges

 

Public bug reported:

Restarting current ovs neutron agent under a heavy load with Ryu
(ofctl=native) leads intermittently to disruption of traffic as
manifested by occasional failures in not-yet-commited fullstack test
[1]. More specifically, the reason seems to be too slow restart of Ryu
controller in combination with OVS vswitchd timeouts. The disruption of
the traffic occurs always after the following log entry is recorded in
ovs-vswitchd.log:

  fail_open|WARN|Could not connect to controller (or switch failed
controller's post-connection admission control policy) for 15 seconds,
failing open

This issue is manifested regardless of network type (VLAN, flat) and
vsctl interface (cli, native). It has not occured with ofctl=cli though.

The issue occurs for physical switches as they are in default fail mode,
meaning that once controller connection is lost, ovs takes over the
management of the flows and clears them. This conclusion is based on
flows dumped from situation just before the traffic was blocked and
after that event when flows were cleared. Before the event:

  ====== br-eth23164fdb4 =======
  Fri Jul 29 10:44:04 UTC 2016
  OFPST_FLOW reply (OF1.3) (xid=0x2):
   cookie=0x921fc02d0b4f49e1, duration=16.137s, table=0, n_packets=16, n_bytes=1400, priority=4,in_port=2,dl_vlan=1 actions=set_field:5330->vlan_vid,NORMAL
   cookie=0x921fc02d0b4f49e1, duration=25.647s, table=0, n_packets=6, n_bytes=508, priority=2,in_port=2 actions=drop
   cookie=0x921fc02d0b4f49e1, duration=26.250s, table=0, n_packets=16, n_bytes=1400, priority=0 actions=NORMAL

After the disruption:

  ====== br-eth23164fdb4 =======
  Fri Jul 29 10:44:05 UTC 2016
  OFPST_FLOW reply (OF1.3) (xid=0x2):


The same bug apperars in this condition (courtesy of Jakub Libosvar): setup a phys bridge, block the traffic for the respective bridge controller via iptables until OVS timeout occurs, and then check openflow rules of the affected bridge.


[1] https://review.openstack.org/#/c/334926/

** Affects: neutron
     Importance: High
         Status: Confirmed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1607787

Title:
  Missing secure fail mode on physical bridges

Status in neutron:
  Confirmed

Bug description:
  Restarting current ovs neutron agent under a heavy load with Ryu
  (ofctl=native) leads intermittently to disruption of traffic as
  manifested by occasional failures in not-yet-commited fullstack test
  [1]. More specifically, the reason seems to be too slow restart of Ryu
  controller in combination with OVS vswitchd timeouts. The disruption
  of the traffic occurs always after the following log entry is recorded
  in ovs-vswitchd.log:

    fail_open|WARN|Could not connect to controller (or switch failed
  controller's post-connection admission control policy) for 15 seconds,
  failing open

  This issue is manifested regardless of network type (VLAN, flat) and
  vsctl interface (cli, native). It has not occured with ofctl=cli
  though.

  The issue occurs for physical switches as they are in default fail
  mode, meaning that once controller connection is lost, ovs takes over
  the management of the flows and clears them. This conclusion is based
  on flows dumped from situation just before the traffic was blocked and
  after that event when flows were cleared. Before the event:

    ====== br-eth23164fdb4 =======
    Fri Jul 29 10:44:04 UTC 2016
    OFPST_FLOW reply (OF1.3) (xid=0x2):
     cookie=0x921fc02d0b4f49e1, duration=16.137s, table=0, n_packets=16, n_bytes=1400, priority=4,in_port=2,dl_vlan=1 actions=set_field:5330->vlan_vid,NORMAL
     cookie=0x921fc02d0b4f49e1, duration=25.647s, table=0, n_packets=6, n_bytes=508, priority=2,in_port=2 actions=drop
     cookie=0x921fc02d0b4f49e1, duration=26.250s, table=0, n_packets=16, n_bytes=1400, priority=0 actions=NORMAL

  After the disruption:

    ====== br-eth23164fdb4 =======
    Fri Jul 29 10:44:05 UTC 2016
    OFPST_FLOW reply (OF1.3) (xid=0x2):

  
  The same bug apperars in this condition (courtesy of Jakub Libosvar): setup a phys bridge, block the traffic for the respective bridge controller via iptables until OVS timeout occurs, and then check openflow rules of the affected bridge.

  
  [1] https://review.openstack.org/#/c/334926/

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1607787/+subscriptions


Follow ups