← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1884708] Re: explicity_egress_direct prevents learning of local MACs and causes flooding of ingress packets

 

I'm reopening this because I believe the fix committed fixes only part
of the problem. With firewall_driver=noop the unnecessary ingress
flooding on br-int is gone. However we still have the same unnecessary
flooding with firewall_driver=openvswitch. For details and a full
reproduction please comments to bug #2048785:

https://bugs.launchpad.net/neutron/+bug/2048785/comments/2
https://bugs.launchpad.net/neutron/+bug/2048785/comments/6


** Changed in: neutron
       Status: Fix Released => New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1884708

Title:
  explicity_egress_direct prevents learning of local MACs and causes
  flooding of ingress packets

Status in neutron:
  New

Bug description:
  We took this bug fix: https://bugs.launchpad.net/neutron/+bug/1732067
  and then also backported ourselves
  https://bugs.launchpad.net/neutron/+bug/1866445

  The latter is for iptables based firewall.

  We have VLAN based networks, and seeing ingress packets destined to
  local MACs being flooded. We are not seeing any local MACs present
  under ovs-appctl fdb/show br-int.

  Consider following example:

  HOST 1:
  MAC A = fa:16:3e:c1:01:43
  MAC B = fa:16:3e:de:0b:8a

  HOST 2:
  MAC C = fa:16:3e:d6:3f:31

  A is talking to C. Snooping on qvo interface of B, we are seeing all
  the traffic destined to MAC A (along with other unicast traffic not
  destined to or sourced from MAC B. Neither Mac A or B are present in
  br-int FDB, despite sending heavy traffic.

  
  Here is ofproto trace for such packet. in_port 8313 is qvo of MAC A:

  sudo ovs-appctl ofproto/trace br-int in_port=8313,tcp,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31
  Flow: tcp,in_port=8313,vlan_tci=0x0000,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31,nw_src=0.0.0.0,nw_dst=0.0.0.0,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=0,tp_dst=0,tcp_flags=0

  bridge("br-int")
  ----------------
   0. in_port=8313, priority 9, cookie 0x9a67096130ac45c2
      goto_table:25
  25. in_port=8313,dl_src=fa:16:3e:c1:01:43, priority 2, cookie 0x9a67096130ac45c2
      goto_table:60
  60. in_port=8313,dl_src=fa:16:3e:c1:01:43, priority 9, cookie 0x9a67096130ac45c2
      resubmit(,61)
  61. in_port=8313,dl_src=fa:16:3e:c1:01:43,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00, priority 10, cookie 0x9a67096130ac45c2
      push_vlan:0x8100
      set_field:4098->vlan_vid
      output:1

  bridge("br-ext")
  ----------------
   0. in_port=2, priority 2, cookie 0xab09adf2af892674
      goto_table:1
   1. priority 0, cookie 0xab09adf2af892674
      goto_table:2
   2. in_port=2,dl_vlan=2, priority 4, cookie 0xab09adf2af892674
      set_field:4240->vlan_vid
      NORMAL
       -> forwarding to learned port

  bridge("br-vlan")
  -----------------
   0. priority 1, cookie 0x651552fc69601a2d
      goto_table:3
   3. priority 1, cookie 0x651552fc69601a2d
      NORMAL
       -> forwarding to learned port

  Final flow: tcp,in_port=8313,dl_vlan=2,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31,nw_src=0.0.0.0,nw_dst=0.0.0.0,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=0,tp_dst=0,tcp_flags=0
  Megaflow: recirc_id=0,eth,ip,in_port=8313,vlan_tci=0x0000/0x1fff,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31,nw_frag=no
  Datapath actions: push_vlan(vid=144,pcp=0),51

  
  Because it took output: action from table=61, added by fix explicitly_egress_direct, the local MAC is not learned. But on ingress, the packet is hitting table=60's NORMAL action, causing it to be flooded because it never knows where to send the local MAC.

  sudo ovs-appctl ofproto/trace br-int in_port=1,dl_vlan=144,dl_src=fa:16:3e:d6:3f:31,dl_dst=fa:16:3e:c1:01:43
  Flow: in_port=1,dl_vlan=144,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=fa:16:3e:d6:3f:31,dl_dst=fa:16:3e:c1:01:43,dl_type=0x0000

  bridge("br-int")
  ----------------
   0. in_port=1,dl_vlan=144, priority 3, cookie 0x9a67096130ac45c2
      set_field:4098->vlan_vid
      goto_table:60
  60. priority 3, cookie 0x9a67096130ac45c2
      NORMAL
       -> no learned MAC for destination, flooding

      bridge("br-vlan")
      -----------------
           0. in_port=4, priority 2, cookie 0x651552fc69601a2d
              goto_table:1
           1. priority 0, cookie 0x651552fc69601a2d
              goto_table:2
           2. in_port=4, priority 2, cookie 0x651552fc69601a2d
              drop

  bridge("br-tun")
  ----------------
   0. in_port=1, priority 1, cookie 0xf1baf24d000c6f7c
      goto_table:1
   1. priority 0, cookie 0xf1baf24d000c6f7c
      goto_table:2
   2. dl_dst=00:00:00:00:00:00/01:00:00:00:00:00, priority 0, cookie 0xf1baf24d000c6f7c
      goto_table:20
  20. priority 0, cookie 0xf1baf24d000c6f7c
      goto_table:22
  22. priority 0, cookie 0xf1baf24d000c6f7c
      drop

  Final flow: in_port=1,dl_vlan=2,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=fa:16:3e:d6:3f:31,dl_dst=fa:16:3e:c1:01:43,dl_type=0x0000
  Megaflow: recirc_id=0,eth,in_port=1,dl_vlan=144,dl_vlan_pcp=0,dl_src=fa:16:3e:d6:3f:31,dl_dst=fa:16:3e:c1:01:43,dl_type=0x0000
  Datapath actions: pop_vlan,push_vlan(vid=2,pcp=0),7,pop_vlan,46,26,57,58,13,6,61,66,68,22,23,72,78,79,34,81,83,2,18,87,33,88,90,91,94,95,99,100,101,102,103,106,108,113,115,116,125,132,133,134,144,145,146,147,165,168,169,170,173,174,175,178,201,203,204,205,216,222,148,150,200,160,181,54,159,151,110,182,114,233,241,212,238,154,11,213,70,29,37,131,45,93,14,139,48,105,152,129,28,12,107,172,196,3,4,62,40,183,124,20,32,67,82,135,153,84,98,109,111,123,5,65,119,120,104,122,128,130,137,142,143,121,141,176,177,179,184,186,190



  dump-flows br-int indicates it first hits this rule:

   cookie=0x6832197111786c03, duration=107845.507s, table=0,
  n_packets=98500552445, n_bytes=66585173373354, idle_age=0,
  hard_age=65534, priority=3,in_port=1,dl_vlan=144
  actions=mod_vlan_vid:1,resubmit(,60)

  then at table=60, the only rule it matches is the final NORMAL rule:

  cookie=0x6832197111786c03, duration=107949.777s, table=60,
  n_packets=245019667777, n_bytes=135203331684577, idle_age=0,
  hard_age=65534, priority=3 actions=NORMAL


  I tried both attaching, and unattaching the subnet to a DVR router. If
  I attach to a DVR router, I *DO* see a bunch of table=60 output
  actions for my local VMs. The problem however, is they appear with the
  *external VLAN ID*, here is an example:

   cookie=0x6832197111786c03, duration=107840.054s, table=60,
  n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534,
  priority=20,dl_vlan=144,dl_dst=fa:16:3e:59:d2:b1
  actions=strip_vlan,output:5663

  But as we saw, the ingress packet hits that first table=0
  mod_vlan_vid:1,resubmit(,60), which changes VLAN 144 to the internal
  VLAN of 1.

  
  For a network not attached to DVR router, there is a similar table=0, rule to change from external VLAN to internal VLAN:

   cookie=0xbab0a875dbcda4a0, duration=25949.321s, table=0,
  n_packets=2618258, n_bytes=2851837213, idle_age=0,
  priority=3,in_port=1,dl_vlan=2505
  actions=mod_vlan_vid:83,resubmit(,60)

  And because this is a provider network, there are no local DVR mac
  rules at table=60, so it always hits NORMAL action.


  So, how do we cover all bases and ensure we have the fix to prevent
  egress flooding (https://bugs.launchpad.net/neutron/+bug/1732067 and
  https://bugs.launchpad.net/neutron/+bug/1866445), but then also
  prevent ingress flooding? The fix for one seems to cause breakage in
  other direction

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1884708/+subscriptions



References