yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #54407
[Bug 1607787] [NEW] Missing secure fail mode on physical bridges
Public bug reported:
Restarting current ovs neutron agent under a heavy load with Ryu
(ofctl=native) leads intermittently to disruption of traffic as
manifested by occasional failures in not-yet-commited fullstack test
[1]. More specifically, the reason seems to be too slow restart of Ryu
controller in combination with OVS vswitchd timeouts. The disruption of
the traffic occurs always after the following log entry is recorded in
ovs-vswitchd.log:
fail_open|WARN|Could not connect to controller (or switch failed
controller's post-connection admission control policy) for 15 seconds,
failing open
This issue is manifested regardless of network type (VLAN, flat) and
vsctl interface (cli, native). It has not occured with ofctl=cli though.
The issue occurs for physical switches as they are in default fail mode,
meaning that once controller connection is lost, ovs takes over the
management of the flows and clears them. This conclusion is based on
flows dumped from situation just before the traffic was blocked and
after that event when flows were cleared. Before the event:
====== br-eth23164fdb4 =======
Fri Jul 29 10:44:04 UTC 2016
OFPST_FLOW reply (OF1.3) (xid=0x2):
cookie=0x921fc02d0b4f49e1, duration=16.137s, table=0, n_packets=16, n_bytes=1400, priority=4,in_port=2,dl_vlan=1 actions=set_field:5330->vlan_vid,NORMAL
cookie=0x921fc02d0b4f49e1, duration=25.647s, table=0, n_packets=6, n_bytes=508, priority=2,in_port=2 actions=drop
cookie=0x921fc02d0b4f49e1, duration=26.250s, table=0, n_packets=16, n_bytes=1400, priority=0 actions=NORMAL
After the disruption:
====== br-eth23164fdb4 =======
Fri Jul 29 10:44:05 UTC 2016
OFPST_FLOW reply (OF1.3) (xid=0x2):
The same bug apperars in this condition (courtesy of Jakub Libosvar): setup a phys bridge, block the traffic for the respective bridge controller via iptables until OVS timeout occurs, and then check openflow rules of the affected bridge.
[1] https://review.openstack.org/#/c/334926/
** Affects: neutron
Importance: High
Status: Confirmed
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1607787
Title:
Missing secure fail mode on physical bridges
Status in neutron:
Confirmed
Bug description:
Restarting current ovs neutron agent under a heavy load with Ryu
(ofctl=native) leads intermittently to disruption of traffic as
manifested by occasional failures in not-yet-commited fullstack test
[1]. More specifically, the reason seems to be too slow restart of Ryu
controller in combination with OVS vswitchd timeouts. The disruption
of the traffic occurs always after the following log entry is recorded
in ovs-vswitchd.log:
fail_open|WARN|Could not connect to controller (or switch failed
controller's post-connection admission control policy) for 15 seconds,
failing open
This issue is manifested regardless of network type (VLAN, flat) and
vsctl interface (cli, native). It has not occured with ofctl=cli
though.
The issue occurs for physical switches as they are in default fail
mode, meaning that once controller connection is lost, ovs takes over
the management of the flows and clears them. This conclusion is based
on flows dumped from situation just before the traffic was blocked and
after that event when flows were cleared. Before the event:
====== br-eth23164fdb4 =======
Fri Jul 29 10:44:04 UTC 2016
OFPST_FLOW reply (OF1.3) (xid=0x2):
cookie=0x921fc02d0b4f49e1, duration=16.137s, table=0, n_packets=16, n_bytes=1400, priority=4,in_port=2,dl_vlan=1 actions=set_field:5330->vlan_vid,NORMAL
cookie=0x921fc02d0b4f49e1, duration=25.647s, table=0, n_packets=6, n_bytes=508, priority=2,in_port=2 actions=drop
cookie=0x921fc02d0b4f49e1, duration=26.250s, table=0, n_packets=16, n_bytes=1400, priority=0 actions=NORMAL
After the disruption:
====== br-eth23164fdb4 =======
Fri Jul 29 10:44:05 UTC 2016
OFPST_FLOW reply (OF1.3) (xid=0x2):
The same bug apperars in this condition (courtesy of Jakub Libosvar): setup a phys bridge, block the traffic for the respective bridge controller via iptables until OVS timeout occurs, and then check openflow rules of the affected bridge.
[1] https://review.openstack.org/#/c/334926/
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1607787/+subscriptions
Follow ups