← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1607787] Re: Missing secure fail mode on physical bridges

 

Reviewed:  https://review.openstack.org/348889
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=9429c2da01fa29cedcb2a65a26c1c29d0a713670
Submitter: Jenkins
Branch:    master

commit 9429c2da01fa29cedcb2a65a26c1c29d0a713670
Author: Hynek Mlnarik <hmlnarik@xxxxxxxxxx>
Date:   Wed Aug 10 10:05:57 2016 +0200

    Set secure fail mode for physical bridges
    
    Physical bridges can cause network disruption when ofctl controller becomes
    inaccessible due to heavy load or when the traffic to controller is blocked.
    By setting secure fail mode, the openflow rules remain untouched on such
    an event, while with the default setting, the flows are cleared.
    
    Co-Authored-By: Jakub Libosvar <libosvar@xxxxxxxxxx>
    Closes-Bug: 1607787
    Change-Id: I1dffe0a248664d2a675fd1ca58530c233e335d2d
    UpgradeImpact


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1607787

Title:
  Missing secure fail mode on physical bridges

Status in neutron:
  Fix Released

Bug description:
  Restarting current ovs neutron agent under a heavy load with Ryu
  (ofctl=native) leads intermittently to disruption of traffic as
  manifested by occasional failures in not-yet-commited fullstack test
  [1]. More specifically, the reason seems to be too slow restart of Ryu
  controller in combination with OVS vswitchd timeouts. The disruption
  of the traffic occurs always after the following log entry is recorded
  in ovs-vswitchd.log:

    fail_open|WARN|Could not connect to controller (or switch failed
  controller's post-connection admission control policy) for 15 seconds,
  failing open

  This issue is manifested regardless of network type (VLAN, flat) and
  vsctl interface (cli, native). It has not occured with ofctl=cli
  though.

  The issue occurs for physical switches as they are in default fail
  mode, meaning that once controller connection is lost, ovs takes over
  the management of the flows and clears them. This conclusion is based
  on flows dumped from situation just before the traffic was blocked and
  after that event when flows were cleared. Before the event:

    ====== br-eth23164fdb4 =======
    Fri Jul 29 10:44:04 UTC 2016
    OFPST_FLOW reply (OF1.3) (xid=0x2):
     cookie=0x921fc02d0b4f49e1, duration=16.137s, table=0, n_packets=16, n_bytes=1400, priority=4,in_port=2,dl_vlan=1 actions=set_field:5330->vlan_vid,NORMAL
     cookie=0x921fc02d0b4f49e1, duration=25.647s, table=0, n_packets=6, n_bytes=508, priority=2,in_port=2 actions=drop
     cookie=0x921fc02d0b4f49e1, duration=26.250s, table=0, n_packets=16, n_bytes=1400, priority=0 actions=NORMAL

  After the disruption:

    ====== br-eth23164fdb4 =======
    Fri Jul 29 10:44:05 UTC 2016
    OFPST_FLOW reply (OF1.3) (xid=0x2):

  
  The same bug apperars in this condition (courtesy of Jakub Libosvar): setup a phys bridge, block the traffic for the respective bridge controller via iptables until OVS timeout occurs, and then check openflow rules of the affected bridge.

  
  [1] https://review.openstack.org/#/c/334926/

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1607787/+subscriptions


References