yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #55017
[Bug 1607787] Re: Missing secure fail mode on physical bridges
Reviewed: https://review.openstack.org/348889
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=9429c2da01fa29cedcb2a65a26c1c29d0a713670
Submitter: Jenkins
Branch: master
commit 9429c2da01fa29cedcb2a65a26c1c29d0a713670
Author: Hynek Mlnarik <hmlnarik@xxxxxxxxxx>
Date: Wed Aug 10 10:05:57 2016 +0200
Set secure fail mode for physical bridges
Physical bridges can cause network disruption when ofctl controller becomes
inaccessible due to heavy load or when the traffic to controller is blocked.
By setting secure fail mode, the openflow rules remain untouched on such
an event, while with the default setting, the flows are cleared.
Co-Authored-By: Jakub Libosvar <libosvar@xxxxxxxxxx>
Closes-Bug: 1607787
Change-Id: I1dffe0a248664d2a675fd1ca58530c233e335d2d
UpgradeImpact
** Changed in: neutron
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1607787
Title:
Missing secure fail mode on physical bridges
Status in neutron:
Fix Released
Bug description:
Restarting current ovs neutron agent under a heavy load with Ryu
(ofctl=native) leads intermittently to disruption of traffic as
manifested by occasional failures in not-yet-commited fullstack test
[1]. More specifically, the reason seems to be too slow restart of Ryu
controller in combination with OVS vswitchd timeouts. The disruption
of the traffic occurs always after the following log entry is recorded
in ovs-vswitchd.log:
fail_open|WARN|Could not connect to controller (or switch failed
controller's post-connection admission control policy) for 15 seconds,
failing open
This issue is manifested regardless of network type (VLAN, flat) and
vsctl interface (cli, native). It has not occured with ofctl=cli
though.
The issue occurs for physical switches as they are in default fail
mode, meaning that once controller connection is lost, ovs takes over
the management of the flows and clears them. This conclusion is based
on flows dumped from situation just before the traffic was blocked and
after that event when flows were cleared. Before the event:
====== br-eth23164fdb4 =======
Fri Jul 29 10:44:04 UTC 2016
OFPST_FLOW reply (OF1.3) (xid=0x2):
cookie=0x921fc02d0b4f49e1, duration=16.137s, table=0, n_packets=16, n_bytes=1400, priority=4,in_port=2,dl_vlan=1 actions=set_field:5330->vlan_vid,NORMAL
cookie=0x921fc02d0b4f49e1, duration=25.647s, table=0, n_packets=6, n_bytes=508, priority=2,in_port=2 actions=drop
cookie=0x921fc02d0b4f49e1, duration=26.250s, table=0, n_packets=16, n_bytes=1400, priority=0 actions=NORMAL
After the disruption:
====== br-eth23164fdb4 =======
Fri Jul 29 10:44:05 UTC 2016
OFPST_FLOW reply (OF1.3) (xid=0x2):
The same bug apperars in this condition (courtesy of Jakub Libosvar): setup a phys bridge, block the traffic for the respective bridge controller via iptables until OVS timeout occurs, and then check openflow rules of the affected bridge.
[1] https://review.openstack.org/#/c/334926/
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1607787/+subscriptions
References