← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1884708] Re: explicity_egress_direct prevents learning of local MACs and causes flooding of ingress packets

 

Reviewed:  https://review.opendev.org/738551
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=959d8b6d73e2a6ab1a45c9a7b0b05ae163e650fc
Submitter: Zuul
Branch:    master

commit 959d8b6d73e2a6ab1a45c9a7b0b05ae163e650fc
Author: LIU Yulong <i@xxxxxxxxxxxx>
Date:   Fri Jul 10 17:25:15 2020 +0800

    Local mac direct flow for non-openflow firewall
    
    When there is no openflow firewall, aka the ovs agent security group
    is disabled or Noop/HybridIptable, this patch will introduce a different
    ingress pipeline for bridge ports which will avoid ingress flood:
    (1) table=0,  in_port=patch_bridge,dl_vlan=physical_vlan action=mod_vlan:local_vlan,goto:60 (original)
    (2) table=60, in_port=patch_bridge                       action=goto:61                     (new)
    (3) table=61, dl_dst=local_port_mac,dl_vlan=local_vlan,  action=strip_vlan,output:<ofport>  (changes)
    
    And changes the local ports pipeline:
    (1) table=0,  in_port=local_ofport                       action=goto:25                  (original)
    (2) table=25, in_port=local_ofport,dl_src=local_port_mac action=goto:60                  (original)
    (3) table=60, in_port=local_ofport,dl_src=local_port_mac action=local_vlan->reg6,goto:61 (changes)
    (4) table=61, dl_dst=local_port_mac,reg6=local_vlan,     action=output:<ofport>          (changes)
    
    Closes-Bug: #1884708
    Closes-Bug: #1881070
    Related-Bug: #1732067
    Related-Bug: #1866445
    Related-Bug: #1883321
    
    Change-Id: Iecf9cffaf02616342f1727ad7db85545d8adbec2


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1884708

Title:
  explicity_egress_direct prevents learning of local MACs and causes
  flooding of ingress packets

Status in neutron:
  Fix Released

Bug description:
  We took this bug fix: https://bugs.launchpad.net/neutron/+bug/1732067
  and then also backported ourselves
  https://bugs.launchpad.net/neutron/+bug/1866445

  The latter is for iptables based firewall.

  We have VLAN based networks, and seeing ingress packets destined to
  local MACs being flooded. We are not seeing any local MACs present
  under ovs-appctl fdb/show br-int.

  Consider following example:

  HOST 1:
  MAC A = fa:16:3e:c1:01:43
  MAC B = fa:16:3e:de:0b:8a

  HOST 2:
  MAC C = fa:16:3e:d6:3f:31

  A is talking to C. Snooping on qvo interface of B, we are seeing all
  the traffic destined to MAC A (along with other unicast traffic not
  destined to or sourced from MAC B. Neither Mac A or B are present in
  br-int FDB, despite sending heavy traffic.

  
  Here is ofproto trace for such packet. in_port 8313 is qvo of MAC A:

  sudo ovs-appctl ofproto/trace br-int in_port=8313,tcp,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31
  Flow: tcp,in_port=8313,vlan_tci=0x0000,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31,nw_src=0.0.0.0,nw_dst=0.0.0.0,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=0,tp_dst=0,tcp_flags=0

  bridge("br-int")
  ----------------
   0. in_port=8313, priority 9, cookie 0x9a67096130ac45c2
      goto_table:25
  25. in_port=8313,dl_src=fa:16:3e:c1:01:43, priority 2, cookie 0x9a67096130ac45c2
      goto_table:60
  60. in_port=8313,dl_src=fa:16:3e:c1:01:43, priority 9, cookie 0x9a67096130ac45c2
      resubmit(,61)
  61. in_port=8313,dl_src=fa:16:3e:c1:01:43,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00, priority 10, cookie 0x9a67096130ac45c2
      push_vlan:0x8100
      set_field:4098->vlan_vid
      output:1

  bridge("br-ext")
  ----------------
   0. in_port=2, priority 2, cookie 0xab09adf2af892674
      goto_table:1
   1. priority 0, cookie 0xab09adf2af892674
      goto_table:2
   2. in_port=2,dl_vlan=2, priority 4, cookie 0xab09adf2af892674
      set_field:4240->vlan_vid
      NORMAL
       -> forwarding to learned port

  bridge("br-vlan")
  -----------------
   0. priority 1, cookie 0x651552fc69601a2d
      goto_table:3
   3. priority 1, cookie 0x651552fc69601a2d
      NORMAL
       -> forwarding to learned port

  Final flow: tcp,in_port=8313,dl_vlan=2,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31,nw_src=0.0.0.0,nw_dst=0.0.0.0,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=0,tp_dst=0,tcp_flags=0
  Megaflow: recirc_id=0,eth,ip,in_port=8313,vlan_tci=0x0000/0x1fff,dl_src=fa:16:3e:c1:01:43,dl_dst=fa:16:3e:d6:3f:31,nw_frag=no
  Datapath actions: push_vlan(vid=144,pcp=0),51

  
  Because it took output: action from table=61, added by fix explicitly_egress_direct, the local MAC is not learned. But on ingress, the packet is hitting table=60's NORMAL action, causing it to be flooded because it never knows where to send the local MAC.

  sudo ovs-appctl ofproto/trace br-int in_port=1,dl_vlan=144,dl_src=fa:16:3e:d6:3f:31,dl_dst=fa:16:3e:c1:01:43
  Flow: in_port=1,dl_vlan=144,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=fa:16:3e:d6:3f:31,dl_dst=fa:16:3e:c1:01:43,dl_type=0x0000

  bridge("br-int")
  ----------------
   0. in_port=1,dl_vlan=144, priority 3, cookie 0x9a67096130ac45c2
      set_field:4098->vlan_vid
      goto_table:60
  60. priority 3, cookie 0x9a67096130ac45c2
      NORMAL
       -> no learned MAC for destination, flooding

      bridge("br-vlan")
      -----------------
           0. in_port=4, priority 2, cookie 0x651552fc69601a2d
              goto_table:1
           1. priority 0, cookie 0x651552fc69601a2d
              goto_table:2
           2. in_port=4, priority 2, cookie 0x651552fc69601a2d
              drop

  bridge("br-tun")
  ----------------
   0. in_port=1, priority 1, cookie 0xf1baf24d000c6f7c
      goto_table:1
   1. priority 0, cookie 0xf1baf24d000c6f7c
      goto_table:2
   2. dl_dst=00:00:00:00:00:00/01:00:00:00:00:00, priority 0, cookie 0xf1baf24d000c6f7c
      goto_table:20
  20. priority 0, cookie 0xf1baf24d000c6f7c
      goto_table:22
  22. priority 0, cookie 0xf1baf24d000c6f7c
      drop

  Final flow: in_port=1,dl_vlan=2,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=fa:16:3e:d6:3f:31,dl_dst=fa:16:3e:c1:01:43,dl_type=0x0000
  Megaflow: recirc_id=0,eth,in_port=1,dl_vlan=144,dl_vlan_pcp=0,dl_src=fa:16:3e:d6:3f:31,dl_dst=fa:16:3e:c1:01:43,dl_type=0x0000
  Datapath actions: pop_vlan,push_vlan(vid=2,pcp=0),7,pop_vlan,46,26,57,58,13,6,61,66,68,22,23,72,78,79,34,81,83,2,18,87,33,88,90,91,94,95,99,100,101,102,103,106,108,113,115,116,125,132,133,134,144,145,146,147,165,168,169,170,173,174,175,178,201,203,204,205,216,222,148,150,200,160,181,54,159,151,110,182,114,233,241,212,238,154,11,213,70,29,37,131,45,93,14,139,48,105,152,129,28,12,107,172,196,3,4,62,40,183,124,20,32,67,82,135,153,84,98,109,111,123,5,65,119,120,104,122,128,130,137,142,143,121,141,176,177,179,184,186,190



  dump-flows br-int indicates it first hits this rule:

   cookie=0x6832197111786c03, duration=107845.507s, table=0,
  n_packets=98500552445, n_bytes=66585173373354, idle_age=0,
  hard_age=65534, priority=3,in_port=1,dl_vlan=144
  actions=mod_vlan_vid:1,resubmit(,60)

  then at table=60, the only rule it matches is the final NORMAL rule:

  cookie=0x6832197111786c03, duration=107949.777s, table=60,
  n_packets=245019667777, n_bytes=135203331684577, idle_age=0,
  hard_age=65534, priority=3 actions=NORMAL


  I tried both attaching, and unattaching the subnet to a DVR router. If
  I attach to a DVR router, I *DO* see a bunch of table=60 output
  actions for my local VMs. The problem however, is they appear with the
  *external VLAN ID*, here is an example:

   cookie=0x6832197111786c03, duration=107840.054s, table=60,
  n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534,
  priority=20,dl_vlan=144,dl_dst=fa:16:3e:59:d2:b1
  actions=strip_vlan,output:5663

  But as we saw, the ingress packet hits that first table=0
  mod_vlan_vid:1,resubmit(,60), which changes VLAN 144 to the internal
  VLAN of 1.

  
  For a network not attached to DVR router, there is a similar table=0, rule to change from external VLAN to internal VLAN:

   cookie=0xbab0a875dbcda4a0, duration=25949.321s, table=0,
  n_packets=2618258, n_bytes=2851837213, idle_age=0,
  priority=3,in_port=1,dl_vlan=2505
  actions=mod_vlan_vid:83,resubmit(,60)

  And because this is a provider network, there are no local DVR mac
  rules at table=60, so it always hits NORMAL action.


  So, how do we cover all bases and ensure we have the fix to prevent
  egress flooding (https://bugs.launchpad.net/neutron/+bug/1732067 and
  https://bugs.launchpad.net/neutron/+bug/1866445), but then also
  prevent ingress flooding? The fix for one seems to cause breakage in
  other direction

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1884708/+subscriptions


References