← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1884708] [NEW] explicity_egress_direction prevents learning of local MACs and causes flooding of ingress packets

 

Public bug reported:

We took this bug fix: https://bugs.launchpad.net/neutron/+bug/1732067
and then also backported ourselves
https://bugs.launchpad.net/neutron/+bug/1866445

The latter is for iptables based firewall.

We have VLAN based networks, and seeing ingress packets destined to
local MACs being flooded. We are not seeing any local MACs present under
ovs-appctl fdb/show br-int.

Consider following example:

HOST 1:
MAC A = fa:16:3e:c1:01:43
MAC B = fa:16:3e:de:0b:8a

HOST 2:
MAC C = fa:16:3e:d6:3f:31

A is talking to C. Snooping on qvo interface of B, we are seeing all the
traffic destined to MAC A (along with other unicast traffic not destined
to or sourced from MAC B. Neither Mac A or B are present in br-int FDB,
despite sending heavy traffic.


Here is ofproto trace for such packet. in_port 8313 is qvo of MAC A:

sudo ovs-appctl ofproto/trace br-int in_port=8313,tcp,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31
Flow: tcp,in_port=8313,vlan_tci=0x0000,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31,nw_src=0.0.0.0,nw_dst=0.0.0.0,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=0,tp_dst=0,tcp_flags=0

bridge("br-int")
----------------
 0. in_port=8313, priority 9, cookie 0x9a67096130ac45c2
    goto_table:25
25. in_port=8313,dl_src=fa:16:3e:c1:01:43, priority 2, cookie 0x9a67096130ac45c2
    goto_table:60
60. in_port=8313,dl_src=fa:16:3e:c1:01:43, priority 9, cookie 0x9a67096130ac45c2
    resubmit(,61)
61. in_port=8313,dl_src=fa:16:3e:c1:01:43,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00, priority 10, cookie 0x9a67096130ac45c2
    push_vlan:0x8100
    set_field:4098->vlan_vid
    output:1

bridge("br-ext")
----------------
 0. in_port=2, priority 2, cookie 0xab09adf2af892674
    goto_table:1
 1. priority 0, cookie 0xab09adf2af892674
    goto_table:2
 2. in_port=2,dl_vlan=2, priority 4, cookie 0xab09adf2af892674
    set_field:4240->vlan_vid
    NORMAL
     -> forwarding to learned port

bridge("br-vlan")
-----------------
 0. priority 1, cookie 0x651552fc69601a2d
    goto_table:3
 3. priority 1, cookie 0x651552fc69601a2d
    NORMAL
     -> forwarding to learned port

Final flow: tcp,in_port=8313,dl_vlan=2,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31,nw_src=0.0.0.0,nw_dst=0.0.0.0,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=0,tp_dst=0,tcp_flags=0
Megaflow: recirc_id=0,eth,ip,in_port=8313,vlan_tci=0x0000/0x1fff,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31,nw_frag=no
Datapath actions: push_vlan(vid=144,pcp=0),51


Because it took output: action from table=61, added by fix explicitly_egress_direct, the local MAC is not learned. But on ingress, the packet is hitting table=60's NORMAL action, causing it to be flooded because it never knows where to send the local MAC.

sudo ovs-appctl ofproto/trace br-int in_port=1,dl_vlan=144,dl_src=fa:16:3e:d6:3f:31,dl_dst=fa:16:3e:c1:01:43
Flow: in_port=1,dl_vlan=144,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=fa:16:3e:d6:3f:31,dl_dst=fa:16:3e:c1:01:43,dl_type=0x0000

bridge("br-int")
----------------
 0. in_port=1,dl_vlan=144, priority 3, cookie 0x9a67096130ac45c2
    set_field:4098->vlan_vid
    goto_table:60
60. priority 3, cookie 0x9a67096130ac45c2
    NORMAL
     -> no learned MAC for destination, flooding

    bridge("br-vlan")
    -----------------
         0. in_port=4, priority 2, cookie 0x651552fc69601a2d
            goto_table:1
         1. priority 0, cookie 0x651552fc69601a2d
            goto_table:2
         2. in_port=4, priority 2, cookie 0x651552fc69601a2d
            drop

bridge("br-tun")
----------------
 0. in_port=1, priority 1, cookie 0xf1baf24d000c6f7c
    goto_table:1
 1. priority 0, cookie 0xf1baf24d000c6f7c
    goto_table:2
 2. dl_dst=00:00:00:00:00:00/01:00:00:00:00:00, priority 0, cookie 0xf1baf24d000c6f7c
    goto_table:20
20. priority 0, cookie 0xf1baf24d000c6f7c
    goto_table:22
22. priority 0, cookie 0xf1baf24d000c6f7c
    drop

Final flow: in_port=1,dl_vlan=2,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=fa:16:3e:d6:3f:31,dl_dst=fa:16:3e:c1:01:43,dl_type=0x0000
Megaflow: recirc_id=0,eth,in_port=1,dl_vlan=144,dl_vlan_pcp=0,dl_src=fa:16:3e:d6:3f:31,dl_dst=fa:16:3e:c1:01:43,dl_type=0x0000
Datapath actions: pop_vlan,push_vlan(vid=2,pcp=0),7,pop_vlan,46,26,57,58,13,6,61,66,68,22,23,72,78,79,34,81,83,2,18,87,33,88,90,91,94,95,99,100,101,102,103,106,108,113,115,116,125,132,133,134,144,145,146,147,165,168,169,170,173,174,175,178,201,203,204,205,216,222,148,150,200,160,181,54,159,151,110,182,114,233,241,212,238,154,11,213,70,29,37,131,45,93,14,139,48,105,152,129,28,12,107,172,196,3,4,62,40,183,124,20,32,67,82,135,153,84,98,109,111,123,5,65,119,120,104,122,128,130,137,142,143,121,141,176,177,179,184,186,190


dump-flows br-int indicates it first hits this rule:

 cookie=0x6832197111786c03, duration=107845.507s, table=0,
n_packets=98500552445, n_bytes=66585173373354, idle_age=0,
hard_age=65534, priority=3,in_port=1,dl_vlan=144
actions=mod_vlan_vid:1,resubmit(,60)

then at table=60, the only rule it matches is the final NORMAL rule:

cookie=0x6832197111786c03, duration=107949.777s, table=60,
n_packets=245019667777, n_bytes=135203331684577, idle_age=0,
hard_age=65534, priority=3 actions=NORMAL


I tried both attaching, and unattaching the subnet to a DVR router. If I
attach to a DVR router, I *DO* see a bunch of table=60 output actions
for my local VMs. The problem however, is they appear with the *external
VLAN ID*, here is an example:

 cookie=0x6832197111786c03, duration=107840.054s, table=60, n_packets=0,
n_bytes=0, idle_age=65534, hard_age=65534,
priority=20,dl_vlan=144,dl_dst=fa:16:3e:59:d2:b1
actions=strip_vlan,output:5663

But as we saw, the ingress packet hits that first table=0
mod_vlan_vid:1,resubmit(,60), which changes VLAN 144 to the internal
VLAN of 1.


For a network not attached to DVR router, there is a similar table=0, rule to change from external VLAN to internal VLAN:

 cookie=0xbab0a875dbcda4a0, duration=25949.321s, table=0,
n_packets=2618258, n_bytes=2851837213, idle_age=0,
priority=3,in_port=1,dl_vlan=2505 actions=mod_vlan_vid:83,resubmit(,60)

And because this is a provider network, there are no local DVR mac rules
at table=60, so it always hits NORMAL action.


So, how do we cover all bases and ensure we have the fix to prevent
egress flooding (https://bugs.launchpad.net/neutron/+bug/1732067 and
https://bugs.launchpad.net/neutron/+bug/1866445), but then also prevent
ingress flooding? The fix for one seems to cause breakage in other
direction

** Affects: neutron
     Importance: Undecided
         Status: New


** Tags: l3-dvr-backlog

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1884708

Title:
  explicity_egress_direction prevents learning of local MACs and causes
  flooding of ingress packets

Status in neutron:
  New

Bug description:
  We took this bug fix: https://bugs.launchpad.net/neutron/+bug/1732067
  and then also backported ourselves
  https://bugs.launchpad.net/neutron/+bug/1866445

  The latter is for iptables based firewall.

  We have VLAN based networks, and seeing ingress packets destined to
  local MACs being flooded. We are not seeing any local MACs present
  under ovs-appctl fdb/show br-int.

  Consider following example:

  HOST 1:
  MAC A = fa:16:3e:c1:01:43
  MAC B = fa:16:3e:de:0b:8a

  HOST 2:
  MAC C = fa:16:3e:d6:3f:31

  A is talking to C. Snooping on qvo interface of B, we are seeing all
  the traffic destined to MAC A (along with other unicast traffic not
  destined to or sourced from MAC B. Neither Mac A or B are present in
  br-int FDB, despite sending heavy traffic.

  
  Here is ofproto trace for such packet. in_port 8313 is qvo of MAC A:

  sudo ovs-appctl ofproto/trace br-int in_port=8313,tcp,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31
  Flow: tcp,in_port=8313,vlan_tci=0x0000,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31,nw_src=0.0.0.0,nw_dst=0.0.0.0,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=0,tp_dst=0,tcp_flags=0

  bridge("br-int")
  ----------------
   0. in_port=8313, priority 9, cookie 0x9a67096130ac45c2
      goto_table:25
  25. in_port=8313,dl_src=fa:16:3e:c1:01:43, priority 2, cookie 0x9a67096130ac45c2
      goto_table:60
  60. in_port=8313,dl_src=fa:16:3e:c1:01:43, priority 9, cookie 0x9a67096130ac45c2
      resubmit(,61)
  61. in_port=8313,dl_src=fa:16:3e:c1:01:43,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00, priority 10, cookie 0x9a67096130ac45c2
      push_vlan:0x8100
      set_field:4098->vlan_vid
      output:1

  bridge("br-ext")
  ----------------
   0. in_port=2, priority 2, cookie 0xab09adf2af892674
      goto_table:1
   1. priority 0, cookie 0xab09adf2af892674
      goto_table:2
   2. in_port=2,dl_vlan=2, priority 4, cookie 0xab09adf2af892674
      set_field:4240->vlan_vid
      NORMAL
       -> forwarding to learned port

  bridge("br-vlan")
  -----------------
   0. priority 1, cookie 0x651552fc69601a2d
      goto_table:3
   3. priority 1, cookie 0x651552fc69601a2d
      NORMAL
       -> forwarding to learned port

  Final flow: tcp,in_port=8313,dl_vlan=2,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31,nw_src=0.0.0.0,nw_dst=0.0.0.0,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=0,tp_dst=0,tcp_flags=0
  Megaflow: recirc_id=0,eth,ip,in_port=8313,vlan_tci=0x0000/0x1fff,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31,nw_frag=no
  Datapath actions: push_vlan(vid=144,pcp=0),51

  
  Because it took output: action from table=61, added by fix explicitly_egress_direct, the local MAC is not learned. But on ingress, the packet is hitting table=60's NORMAL action, causing it to be flooded because it never knows where to send the local MAC.

  sudo ovs-appctl ofproto/trace br-int in_port=1,dl_vlan=144,dl_src=fa:16:3e:d6:3f:31,dl_dst=fa:16:3e:c1:01:43
  Flow: in_port=1,dl_vlan=144,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=fa:16:3e:d6:3f:31,dl_dst=fa:16:3e:c1:01:43,dl_type=0x0000

  bridge("br-int")
  ----------------
   0. in_port=1,dl_vlan=144, priority 3, cookie 0x9a67096130ac45c2
      set_field:4098->vlan_vid
      goto_table:60
  60. priority 3, cookie 0x9a67096130ac45c2
      NORMAL
       -> no learned MAC for destination, flooding

      bridge("br-vlan")
      -----------------
           0. in_port=4, priority 2, cookie 0x651552fc69601a2d
              goto_table:1
           1. priority 0, cookie 0x651552fc69601a2d
              goto_table:2
           2. in_port=4, priority 2, cookie 0x651552fc69601a2d
              drop

  bridge("br-tun")
  ----------------
   0. in_port=1, priority 1, cookie 0xf1baf24d000c6f7c
      goto_table:1
   1. priority 0, cookie 0xf1baf24d000c6f7c
      goto_table:2
   2. dl_dst=00:00:00:00:00:00/01:00:00:00:00:00, priority 0, cookie 0xf1baf24d000c6f7c
      goto_table:20
  20. priority 0, cookie 0xf1baf24d000c6f7c
      goto_table:22
  22. priority 0, cookie 0xf1baf24d000c6f7c
      drop

  Final flow: in_port=1,dl_vlan=2,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=fa:16:3e:d6:3f:31,dl_dst=fa:16:3e:c1:01:43,dl_type=0x0000
  Megaflow: recirc_id=0,eth,in_port=1,dl_vlan=144,dl_vlan_pcp=0,dl_src=fa:16:3e:d6:3f:31,dl_dst=fa:16:3e:c1:01:43,dl_type=0x0000
  Datapath actions: pop_vlan,push_vlan(vid=2,pcp=0),7,pop_vlan,46,26,57,58,13,6,61,66,68,22,23,72,78,79,34,81,83,2,18,87,33,88,90,91,94,95,99,100,101,102,103,106,108,113,115,116,125,132,133,134,144,145,146,147,165,168,169,170,173,174,175,178,201,203,204,205,216,222,148,150,200,160,181,54,159,151,110,182,114,233,241,212,238,154,11,213,70,29,37,131,45,93,14,139,48,105,152,129,28,12,107,172,196,3,4,62,40,183,124,20,32,67,82,135,153,84,98,109,111,123,5,65,119,120,104,122,128,130,137,142,143,121,141,176,177,179,184,186,190



  dump-flows br-int indicates it first hits this rule:

   cookie=0x6832197111786c03, duration=107845.507s, table=0,
  n_packets=98500552445, n_bytes=66585173373354, idle_age=0,
  hard_age=65534, priority=3,in_port=1,dl_vlan=144
  actions=mod_vlan_vid:1,resubmit(,60)

  then at table=60, the only rule it matches is the final NORMAL rule:

  cookie=0x6832197111786c03, duration=107949.777s, table=60,
  n_packets=245019667777, n_bytes=135203331684577, idle_age=0,
  hard_age=65534, priority=3 actions=NORMAL


  I tried both attaching, and unattaching the subnet to a DVR router. If
  I attach to a DVR router, I *DO* see a bunch of table=60 output
  actions for my local VMs. The problem however, is they appear with the
  *external VLAN ID*, here is an example:

   cookie=0x6832197111786c03, duration=107840.054s, table=60,
  n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534,
  priority=20,dl_vlan=144,dl_dst=fa:16:3e:59:d2:b1
  actions=strip_vlan,output:5663

  But as we saw, the ingress packet hits that first table=0
  mod_vlan_vid:1,resubmit(,60), which changes VLAN 144 to the internal
  VLAN of 1.

  
  For a network not attached to DVR router, there is a similar table=0, rule to change from external VLAN to internal VLAN:

   cookie=0xbab0a875dbcda4a0, duration=25949.321s, table=0,
  n_packets=2618258, n_bytes=2851837213, idle_age=0,
  priority=3,in_port=1,dl_vlan=2505
  actions=mod_vlan_vid:83,resubmit(,60)

  And because this is a provider network, there are no local DVR mac
  rules at table=60, so it always hits NORMAL action.


  So, how do we cover all bases and ensure we have the fix to prevent
  egress flooding (https://bugs.launchpad.net/neutron/+bug/1732067 and
  https://bugs.launchpad.net/neutron/+bug/1866445), but then also
  prevent ingress flooding? The fix for one seems to cause breakage in
  other direction

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1884708/+subscriptions


Follow ups