yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #83079
[Bug 1884708] [NEW] explicity_egress_direction prevents learning of local MACs and causes flooding of ingress packets
Public bug reported:
We took this bug fix: https://bugs.launchpad.net/neutron/+bug/1732067
and then also backported ourselves
https://bugs.launchpad.net/neutron/+bug/1866445
The latter is for iptables based firewall.
We have VLAN based networks, and seeing ingress packets destined to
local MACs being flooded. We are not seeing any local MACs present under
ovs-appctl fdb/show br-int.
Consider following example:
HOST 1:
MAC A = fa:16:3e:c1:01:43
MAC B = fa:16:3e:de:0b:8a
HOST 2:
MAC C = fa:16:3e:d6:3f:31
A is talking to C. Snooping on qvo interface of B, we are seeing all the
traffic destined to MAC A (along with other unicast traffic not destined
to or sourced from MAC B. Neither Mac A or B are present in br-int FDB,
despite sending heavy traffic.
Here is ofproto trace for such packet. in_port 8313 is qvo of MAC A:
sudo ovs-appctl ofproto/trace br-int in_port=8313,tcp,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31
Flow: tcp,in_port=8313,vlan_tci=0x0000,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31,nw_src=0.0.0.0,nw_dst=0.0.0.0,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=0,tp_dst=0,tcp_flags=0
bridge("br-int")
----------------
0. in_port=8313, priority 9, cookie 0x9a67096130ac45c2
goto_table:25
25. in_port=8313,dl_src=fa:16:3e:c1:01:43, priority 2, cookie 0x9a67096130ac45c2
goto_table:60
60. in_port=8313,dl_src=fa:16:3e:c1:01:43, priority 9, cookie 0x9a67096130ac45c2
resubmit(,61)
61. in_port=8313,dl_src=fa:16:3e:c1:01:43,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00, priority 10, cookie 0x9a67096130ac45c2
push_vlan:0x8100
set_field:4098->vlan_vid
output:1
bridge("br-ext")
----------------
0. in_port=2, priority 2, cookie 0xab09adf2af892674
goto_table:1
1. priority 0, cookie 0xab09adf2af892674
goto_table:2
2. in_port=2,dl_vlan=2, priority 4, cookie 0xab09adf2af892674
set_field:4240->vlan_vid
NORMAL
-> forwarding to learned port
bridge("br-vlan")
-----------------
0. priority 1, cookie 0x651552fc69601a2d
goto_table:3
3. priority 1, cookie 0x651552fc69601a2d
NORMAL
-> forwarding to learned port
Final flow: tcp,in_port=8313,dl_vlan=2,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31,nw_src=0.0.0.0,nw_dst=0.0.0.0,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=0,tp_dst=0,tcp_flags=0
Megaflow: recirc_id=0,eth,ip,in_port=8313,vlan_tci=0x0000/0x1fff,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31,nw_frag=no
Datapath actions: push_vlan(vid=144,pcp=0),51
Because it took output: action from table=61, added by fix explicitly_egress_direct, the local MAC is not learned. But on ingress, the packet is hitting table=60's NORMAL action, causing it to be flooded because it never knows where to send the local MAC.
sudo ovs-appctl ofproto/trace br-int in_port=1,dl_vlan=144,dl_src=fa:16:3e:d6:3f:31,dl_dst=fa:16:3e:c1:01:43
Flow: in_port=1,dl_vlan=144,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=fa:16:3e:d6:3f:31,dl_dst=fa:16:3e:c1:01:43,dl_type=0x0000
bridge("br-int")
----------------
0. in_port=1,dl_vlan=144, priority 3, cookie 0x9a67096130ac45c2
set_field:4098->vlan_vid
goto_table:60
60. priority 3, cookie 0x9a67096130ac45c2
NORMAL
-> no learned MAC for destination, flooding
bridge("br-vlan")
-----------------
0. in_port=4, priority 2, cookie 0x651552fc69601a2d
goto_table:1
1. priority 0, cookie 0x651552fc69601a2d
goto_table:2
2. in_port=4, priority 2, cookie 0x651552fc69601a2d
drop
bridge("br-tun")
----------------
0. in_port=1, priority 1, cookie 0xf1baf24d000c6f7c
goto_table:1
1. priority 0, cookie 0xf1baf24d000c6f7c
goto_table:2
2. dl_dst=00:00:00:00:00:00/01:00:00:00:00:00, priority 0, cookie 0xf1baf24d000c6f7c
goto_table:20
20. priority 0, cookie 0xf1baf24d000c6f7c
goto_table:22
22. priority 0, cookie 0xf1baf24d000c6f7c
drop
Final flow: in_port=1,dl_vlan=2,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=fa:16:3e:d6:3f:31,dl_dst=fa:16:3e:c1:01:43,dl_type=0x0000
Megaflow: recirc_id=0,eth,in_port=1,dl_vlan=144,dl_vlan_pcp=0,dl_src=fa:16:3e:d6:3f:31,dl_dst=fa:16:3e:c1:01:43,dl_type=0x0000
Datapath actions: pop_vlan,push_vlan(vid=2,pcp=0),7,pop_vlan,46,26,57,58,13,6,61,66,68,22,23,72,78,79,34,81,83,2,18,87,33,88,90,91,94,95,99,100,101,102,103,106,108,113,115,116,125,132,133,134,144,145,146,147,165,168,169,170,173,174,175,178,201,203,204,205,216,222,148,150,200,160,181,54,159,151,110,182,114,233,241,212,238,154,11,213,70,29,37,131,45,93,14,139,48,105,152,129,28,12,107,172,196,3,4,62,40,183,124,20,32,67,82,135,153,84,98,109,111,123,5,65,119,120,104,122,128,130,137,142,143,121,141,176,177,179,184,186,190
dump-flows br-int indicates it first hits this rule:
cookie=0x6832197111786c03, duration=107845.507s, table=0,
n_packets=98500552445, n_bytes=66585173373354, idle_age=0,
hard_age=65534, priority=3,in_port=1,dl_vlan=144
actions=mod_vlan_vid:1,resubmit(,60)
then at table=60, the only rule it matches is the final NORMAL rule:
cookie=0x6832197111786c03, duration=107949.777s, table=60,
n_packets=245019667777, n_bytes=135203331684577, idle_age=0,
hard_age=65534, priority=3 actions=NORMAL
I tried both attaching, and unattaching the subnet to a DVR router. If I
attach to a DVR router, I *DO* see a bunch of table=60 output actions
for my local VMs. The problem however, is they appear with the *external
VLAN ID*, here is an example:
cookie=0x6832197111786c03, duration=107840.054s, table=60, n_packets=0,
n_bytes=0, idle_age=65534, hard_age=65534,
priority=20,dl_vlan=144,dl_dst=fa:16:3e:59:d2:b1
actions=strip_vlan,output:5663
But as we saw, the ingress packet hits that first table=0
mod_vlan_vid:1,resubmit(,60), which changes VLAN 144 to the internal
VLAN of 1.
For a network not attached to DVR router, there is a similar table=0, rule to change from external VLAN to internal VLAN:
cookie=0xbab0a875dbcda4a0, duration=25949.321s, table=0,
n_packets=2618258, n_bytes=2851837213, idle_age=0,
priority=3,in_port=1,dl_vlan=2505 actions=mod_vlan_vid:83,resubmit(,60)
And because this is a provider network, there are no local DVR mac rules
at table=60, so it always hits NORMAL action.
So, how do we cover all bases and ensure we have the fix to prevent
egress flooding (https://bugs.launchpad.net/neutron/+bug/1732067 and
https://bugs.launchpad.net/neutron/+bug/1866445), but then also prevent
ingress flooding? The fix for one seems to cause breakage in other
direction
** Affects: neutron
Importance: Undecided
Status: New
** Tags: l3-dvr-backlog
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1884708
Title:
explicity_egress_direction prevents learning of local MACs and causes
flooding of ingress packets
Status in neutron:
New
Bug description:
We took this bug fix: https://bugs.launchpad.net/neutron/+bug/1732067
and then also backported ourselves
https://bugs.launchpad.net/neutron/+bug/1866445
The latter is for iptables based firewall.
We have VLAN based networks, and seeing ingress packets destined to
local MACs being flooded. We are not seeing any local MACs present
under ovs-appctl fdb/show br-int.
Consider following example:
HOST 1:
MAC A = fa:16:3e:c1:01:43
MAC B = fa:16:3e:de:0b:8a
HOST 2:
MAC C = fa:16:3e:d6:3f:31
A is talking to C. Snooping on qvo interface of B, we are seeing all
the traffic destined to MAC A (along with other unicast traffic not
destined to or sourced from MAC B. Neither Mac A or B are present in
br-int FDB, despite sending heavy traffic.
Here is ofproto trace for such packet. in_port 8313 is qvo of MAC A:
sudo ovs-appctl ofproto/trace br-int in_port=8313,tcp,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31
Flow: tcp,in_port=8313,vlan_tci=0x0000,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31,nw_src=0.0.0.0,nw_dst=0.0.0.0,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=0,tp_dst=0,tcp_flags=0
bridge("br-int")
----------------
0. in_port=8313, priority 9, cookie 0x9a67096130ac45c2
goto_table:25
25. in_port=8313,dl_src=fa:16:3e:c1:01:43, priority 2, cookie 0x9a67096130ac45c2
goto_table:60
60. in_port=8313,dl_src=fa:16:3e:c1:01:43, priority 9, cookie 0x9a67096130ac45c2
resubmit(,61)
61. in_port=8313,dl_src=fa:16:3e:c1:01:43,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00, priority 10, cookie 0x9a67096130ac45c2
push_vlan:0x8100
set_field:4098->vlan_vid
output:1
bridge("br-ext")
----------------
0. in_port=2, priority 2, cookie 0xab09adf2af892674
goto_table:1
1. priority 0, cookie 0xab09adf2af892674
goto_table:2
2. in_port=2,dl_vlan=2, priority 4, cookie 0xab09adf2af892674
set_field:4240->vlan_vid
NORMAL
-> forwarding to learned port
bridge("br-vlan")
-----------------
0. priority 1, cookie 0x651552fc69601a2d
goto_table:3
3. priority 1, cookie 0x651552fc69601a2d
NORMAL
-> forwarding to learned port
Final flow: tcp,in_port=8313,dl_vlan=2,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31,nw_src=0.0.0.0,nw_dst=0.0.0.0,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=0,tp_dst=0,tcp_flags=0
Megaflow: recirc_id=0,eth,ip,in_port=8313,vlan_tci=0x0000/0x1fff,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31,nw_frag=no
Datapath actions: push_vlan(vid=144,pcp=0),51
Because it took output: action from table=61, added by fix explicitly_egress_direct, the local MAC is not learned. But on ingress, the packet is hitting table=60's NORMAL action, causing it to be flooded because it never knows where to send the local MAC.
sudo ovs-appctl ofproto/trace br-int in_port=1,dl_vlan=144,dl_src=fa:16:3e:d6:3f:31,dl_dst=fa:16:3e:c1:01:43
Flow: in_port=1,dl_vlan=144,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=fa:16:3e:d6:3f:31,dl_dst=fa:16:3e:c1:01:43,dl_type=0x0000
bridge("br-int")
----------------
0. in_port=1,dl_vlan=144, priority 3, cookie 0x9a67096130ac45c2
set_field:4098->vlan_vid
goto_table:60
60. priority 3, cookie 0x9a67096130ac45c2
NORMAL
-> no learned MAC for destination, flooding
bridge("br-vlan")
-----------------
0. in_port=4, priority 2, cookie 0x651552fc69601a2d
goto_table:1
1. priority 0, cookie 0x651552fc69601a2d
goto_table:2
2. in_port=4, priority 2, cookie 0x651552fc69601a2d
drop
bridge("br-tun")
----------------
0. in_port=1, priority 1, cookie 0xf1baf24d000c6f7c
goto_table:1
1. priority 0, cookie 0xf1baf24d000c6f7c
goto_table:2
2. dl_dst=00:00:00:00:00:00/01:00:00:00:00:00, priority 0, cookie 0xf1baf24d000c6f7c
goto_table:20
20. priority 0, cookie 0xf1baf24d000c6f7c
goto_table:22
22. priority 0, cookie 0xf1baf24d000c6f7c
drop
Final flow: in_port=1,dl_vlan=2,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=fa:16:3e:d6:3f:31,dl_dst=fa:16:3e:c1:01:43,dl_type=0x0000
Megaflow: recirc_id=0,eth,in_port=1,dl_vlan=144,dl_vlan_pcp=0,dl_src=fa:16:3e:d6:3f:31,dl_dst=fa:16:3e:c1:01:43,dl_type=0x0000
Datapath actions: pop_vlan,push_vlan(vid=2,pcp=0),7,pop_vlan,46,26,57,58,13,6,61,66,68,22,23,72,78,79,34,81,83,2,18,87,33,88,90,91,94,95,99,100,101,102,103,106,108,113,115,116,125,132,133,134,144,145,146,147,165,168,169,170,173,174,175,178,201,203,204,205,216,222,148,150,200,160,181,54,159,151,110,182,114,233,241,212,238,154,11,213,70,29,37,131,45,93,14,139,48,105,152,129,28,12,107,172,196,3,4,62,40,183,124,20,32,67,82,135,153,84,98,109,111,123,5,65,119,120,104,122,128,130,137,142,143,121,141,176,177,179,184,186,190
dump-flows br-int indicates it first hits this rule:
cookie=0x6832197111786c03, duration=107845.507s, table=0,
n_packets=98500552445, n_bytes=66585173373354, idle_age=0,
hard_age=65534, priority=3,in_port=1,dl_vlan=144
actions=mod_vlan_vid:1,resubmit(,60)
then at table=60, the only rule it matches is the final NORMAL rule:
cookie=0x6832197111786c03, duration=107949.777s, table=60,
n_packets=245019667777, n_bytes=135203331684577, idle_age=0,
hard_age=65534, priority=3 actions=NORMAL
I tried both attaching, and unattaching the subnet to a DVR router. If
I attach to a DVR router, I *DO* see a bunch of table=60 output
actions for my local VMs. The problem however, is they appear with the
*external VLAN ID*, here is an example:
cookie=0x6832197111786c03, duration=107840.054s, table=60,
n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534,
priority=20,dl_vlan=144,dl_dst=fa:16:3e:59:d2:b1
actions=strip_vlan,output:5663
But as we saw, the ingress packet hits that first table=0
mod_vlan_vid:1,resubmit(,60), which changes VLAN 144 to the internal
VLAN of 1.
For a network not attached to DVR router, there is a similar table=0, rule to change from external VLAN to internal VLAN:
cookie=0xbab0a875dbcda4a0, duration=25949.321s, table=0,
n_packets=2618258, n_bytes=2851837213, idle_age=0,
priority=3,in_port=1,dl_vlan=2505
actions=mod_vlan_vid:83,resubmit(,60)
And because this is a provider network, there are no local DVR mac
rules at table=60, so it always hits NORMAL action.
So, how do we cover all bases and ensure we have the fix to prevent
egress flooding (https://bugs.launchpad.net/neutron/+bug/1732067 and
https://bugs.launchpad.net/neutron/+bug/1866445), but then also
prevent ingress flooding? The fix for one seems to cause breakage in
other direction
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1884708/+subscriptions
Follow ups