← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2051351] [NEW] explicity_egress_direct prevents learning of local MACs and causes flooding of ingress packets, firewall_driver = openvswitch

 

Public bug reported:

I believe this issue was already reported earlier:

https://bugs.launchpad.net/neutron/+bug/1884708

That bug has a fix committed:

https://review.opendev.org/c/openstack/neutron/+/738551

However I believe the above change fixed only part of the issue (with firewall_driver=noop).
But the same problem is still not fixed with firewall_driver=openvswitch.

First, I re-opened bug #1884708, but then I realized that nobody will
notice a several year old bug's status change, so I rather opened this
new bug report instead.

Reproduction:

# config
ml2_conf.ini:
[securitygroup]
firewall_driver = openvswitch
[agent]
explicitly_egress_direct = True
[ovs]
bridge_mappings = physnet0:br-physnet0,...

# a random IP on net0 we can ping
sudo ip link set up dev br-physnet0
sudo ip link add link br-physnet0 name br-physnet0.100 type vlan id 100
sudo ip link set up dev br-physnet0.100
sudo ip address add dev br-physnet0.100 10.0.100.1/24

# code
devstack 6b0f055b
neutron $ git log --oneline -n2
27601f8eea (HEAD, origin/bug/2048785, origin/HEAD) Set trunk parent port as access port in ovs to avoid loop
3ef02cc2fb (origin/master) Consume code from neutron-lib
openvswitch 2.17.8-0ubuntu0.22.04.1
linux 5.15.0-91-generic

# clean up first
openstack server delete vm0 --wait
openstack port delete port0
openstack network delete net1 net0

# build the environment
openstack network create net0 --provider-network-type vlan --provider-physical-network physnet0 --provider-segment 100
openstack subnet create --network net0 --subnet-range 10.0.100.0/24 subnet0
openstack port create --no-security-group --disable-port-security --network net0 --fixed-ip ip-address=10.0.100.10 port0
openstack server create --flavor cirros256 --image cirros-0.6.2-x86_64-disk --nic port-id=port0 --availability-zone :devstack0a --wait vm0

# mac addresses for reference
$ openstack port show port0 -f value -c mac_address
fa:16:3e:96:58:ab
$ ifdata -ph br-physnet0
82:E8:18:67:7E:40

# generate traffic that will keep fdb entries fresh
sudo virsh console "$( openstack server show vm0 -f value -c OS-EXT-SRV-ATTR:instance_name )"
ping 10.0.100.1

# clear all past junk
for br in br-physnet0 br-int ; do sudo ovs-appctl fdb/flush "$br" ; done

# br-int does not learn port0's mac despite the ongoing ping
for br in br-physnet0 br-int ; do echo ">>> $br <<<" ; sudo ovs-appctl fdb/show "$br" | egrep -i "$( openstack port show port0 -f value -c mac_address )|$( ifdata -ph br-physnet0 )" ; done
>>> br-physnet0 <<<
    1   100  fa:16:3e:96:58:ab    0
LOCAL   100  82:e8:18:67:7e:40    0
>>> br-int <<<
    1     4  82:e8:18:67:7e:40    0

# port and physnet bridge mac in all fdbs, egress == vnic -> physnet bridge
# in br-int we have a direct output action
$ sudo ovs-appctl ofproto/trace br-int in_port="$( sudo ovs-vsctl -- --columns=ofport find Interface name=$( echo "tap$( openstack port show port0 -f value -c id )" | cut -b1-14 ) | awk '{ print $3 }' )",dl_vlan=0,dl_dst=$( ifdata -ph br-physnet0 ),dl_src=$( openstack port show port0 -f value -c mac_address )
Flow: in_port=45,dl_vlan=0,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=fa:16:3e:96:58:ab,dl_dst=82:e8:18:67:7e:40,dl_type=0x0000

bridge("br-int")
----------------
 0. priority 0, cookie 0x2b36d6b4a42fe7b5
    goto_table:58
58. priority 0, cookie 0x2b36d6b4a42fe7b5
    goto_table:60
60. in_port=45, priority 100, cookie 0x2b36d6b4a42fe7b5
    set_field:0x2d->reg5
    set_field:0x4->reg6
    resubmit(,73)
73. reg5=0x2d, priority 80, cookie 0x2b36d6b4a42fe7b5
    resubmit(,94)
94. reg6=0x4,dl_src=fa:16:3e:96:58:ab,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00, priority 10, cookie 0x2b36d6b4a42fe7b5
    push_vlan:0x8100
    set_field:4100->vlan_vid
    output:1

bridge("br-physnet0")
---------------------
 0. in_port=1,dl_vlan=4, priority 4, cookie 0x85bc1a5077d54d3f
    set_field:4196->vlan_vid
    NORMAL
     -> forwarding to learned port

Final flow: reg5=0x2d,reg6=0x4,in_port=45,dl_vlan=4,dl_vlan_pcp=0,dl_vlan1=0,dl_vlan_pcp1=0,dl_src=fa:16:3e:96:58:ab,dl_dst=82:e8:18:67:7e:40,dl_type=0x0000
Megaflow: recirc_id=0,eth,in_port=45,dl_vlan=0,dl_vlan_pcp=0,dl_src=fa:16:3e:96:58:ab,dl_dst=82:e8:18:67:7e:40,dl_type=0x0000
Datapath actions: pop_vlan,push_vlan(vid=100,pcp=0),1

# port and physnet bridge mac in all fdbs, ingress == physnet bridge -> vnic
# in br-int we have the normal action flooding, despite the ongoing ping
$ sudo ovs-appctl ofproto/trace br-physnet0 in_port=LOCAL,dl_vlan=100,dl_src=$( ifdata -ph br-physnet0 ),dl_dst=$( openstack port show port0 -f value -c mac_address )
Flow: in_port=LOCAL,dl_vlan=100,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=82:e8:18:67:7e:40,dl_dst=fa:16:3e:96:58:ab,dl_type=0x0000

bridge("br-physnet0")
---------------------
 0. priority 0, cookie 0x85bc1a5077d54d3f
    NORMAL
     -> forwarding to learned port

bridge("br-int")
----------------
 0. in_port=1,dl_vlan=100, priority 3, cookie 0x2b36d6b4a42fe7b5
    set_field:4100->vlan_vid
    goto_table:58
58. priority 0, cookie 0x2b36d6b4a42fe7b5
    goto_table:60
60. priority 3, cookie 0x2b36d6b4a42fe7b5
    NORMAL
     -> no learned MAC for destination, flooding

bridge("br-tun")
----------------
 0. in_port=1, priority 1, cookie 0xc8cfff9c6bbea88d
    goto_table:2
 2. dl_dst=00:00:00:00:00:00/01:00:00:00:00:00, priority 0, cookie 0xc8cfff9c6bbea88d
    goto_table:20
20. priority 0, cookie 0xc8cfff9c6bbea88d
    goto_table:22
22. priority 0, cookie 0xc8cfff9c6bbea88d
    drop

Final flow: unchanged
Megaflow: recirc_id=0,eth,in_port=LOCAL,dl_vlan=100,dl_vlan_pcp=0,dl_src=82:e8:18:67:7e:40,dl_dst=fa:16:3e:96:58:ab,dl_type=0x0000
Datapath actions: pop_vlan,push_vlan(vid=4,pcp=0),8,13,pop_vlan,9,11

This bug has a long history:

round #1 - some unnecessary flooding in the egress direction
https://bugs.launchpad.net/neutron/+bug/1732067
https://bugs.launchpad.net/neutron/+bug/1841622
fix introducing explicitly_egress_direct:
https://review.opendev.org/c/openstack/neutron/+/666991

round #2 - the fix above introduced some unnecessary ingress flooding
https://bugs.launchpad.net/neutron/+bug/1884708
fix for firewall_driver=noop
https://review.opendev.org/c/openstack/neutron/+/738551
also related:
https://bugs.launchpad.net/neutron/+bug/1732067/comments/50
https://bugs.launchpad.net/neutron/+bug/1732067/comments/79
may be related:
https://bugs.launchpad.net/neutron/+bug/1866445

round #3 (today)
https://bugs.launchpad.net/neutron/+bug/2048785/comments/2
https://bugs.launchpad.net/neutron/+bug/1884708/comments/29

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2051351

Title:
  explicity_egress_direct prevents learning of local MACs and causes
  flooding of ingress packets, firewall_driver = openvswitch

Status in neutron:
  New

Bug description:
  I believe this issue was already reported earlier:

  https://bugs.launchpad.net/neutron/+bug/1884708

  That bug has a fix committed:

  https://review.opendev.org/c/openstack/neutron/+/738551

  However I believe the above change fixed only part of the issue (with firewall_driver=noop).
  But the same problem is still not fixed with firewall_driver=openvswitch.

  First, I re-opened bug #1884708, but then I realized that nobody will
  notice a several year old bug's status change, so I rather opened this
  new bug report instead.

  Reproduction:

  # config
  ml2_conf.ini:
  [securitygroup]
  firewall_driver = openvswitch
  [agent]
  explicitly_egress_direct = True
  [ovs]
  bridge_mappings = physnet0:br-physnet0,...

  # a random IP on net0 we can ping
  sudo ip link set up dev br-physnet0
  sudo ip link add link br-physnet0 name br-physnet0.100 type vlan id 100
  sudo ip link set up dev br-physnet0.100
  sudo ip address add dev br-physnet0.100 10.0.100.1/24

  # code
  devstack 6b0f055b
  neutron $ git log --oneline -n2
  27601f8eea (HEAD, origin/bug/2048785, origin/HEAD) Set trunk parent port as access port in ovs to avoid loop
  3ef02cc2fb (origin/master) Consume code from neutron-lib
  openvswitch 2.17.8-0ubuntu0.22.04.1
  linux 5.15.0-91-generic

  # clean up first
  openstack server delete vm0 --wait
  openstack port delete port0
  openstack network delete net1 net0

  # build the environment
  openstack network create net0 --provider-network-type vlan --provider-physical-network physnet0 --provider-segment 100
  openstack subnet create --network net0 --subnet-range 10.0.100.0/24 subnet0
  openstack port create --no-security-group --disable-port-security --network net0 --fixed-ip ip-address=10.0.100.10 port0
  openstack server create --flavor cirros256 --image cirros-0.6.2-x86_64-disk --nic port-id=port0 --availability-zone :devstack0a --wait vm0

  # mac addresses for reference
  $ openstack port show port0 -f value -c mac_address
  fa:16:3e:96:58:ab
  $ ifdata -ph br-physnet0
  82:E8:18:67:7E:40

  # generate traffic that will keep fdb entries fresh
  sudo virsh console "$( openstack server show vm0 -f value -c OS-EXT-SRV-ATTR:instance_name )"
  ping 10.0.100.1

  # clear all past junk
  for br in br-physnet0 br-int ; do sudo ovs-appctl fdb/flush "$br" ; done

  # br-int does not learn port0's mac despite the ongoing ping
  for br in br-physnet0 br-int ; do echo ">>> $br <<<" ; sudo ovs-appctl fdb/show "$br" | egrep -i "$( openstack port show port0 -f value -c mac_address )|$( ifdata -ph br-physnet0 )" ; done
  >>> br-physnet0 <<<
      1   100  fa:16:3e:96:58:ab    0
  LOCAL   100  82:e8:18:67:7e:40    0
  >>> br-int <<<
      1     4  82:e8:18:67:7e:40    0

  # port and physnet bridge mac in all fdbs, egress == vnic -> physnet bridge
  # in br-int we have a direct output action
  $ sudo ovs-appctl ofproto/trace br-int in_port="$( sudo ovs-vsctl -- --columns=ofport find Interface name=$( echo "tap$( openstack port show port0 -f value -c id )" | cut -b1-14 ) | awk '{ print $3 }' )",dl_vlan=0,dl_dst=$( ifdata -ph br-physnet0 ),dl_src=$( openstack port show port0 -f value -c mac_address )
  Flow: in_port=45,dl_vlan=0,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=fa:16:3e:96:58:ab,dl_dst=82:e8:18:67:7e:40,dl_type=0x0000

  bridge("br-int")
  ----------------
   0. priority 0, cookie 0x2b36d6b4a42fe7b5
      goto_table:58
  58. priority 0, cookie 0x2b36d6b4a42fe7b5
      goto_table:60
  60. in_port=45, priority 100, cookie 0x2b36d6b4a42fe7b5
      set_field:0x2d->reg5
      set_field:0x4->reg6
      resubmit(,73)
  73. reg5=0x2d, priority 80, cookie 0x2b36d6b4a42fe7b5
      resubmit(,94)
  94. reg6=0x4,dl_src=fa:16:3e:96:58:ab,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00, priority 10, cookie 0x2b36d6b4a42fe7b5
      push_vlan:0x8100
      set_field:4100->vlan_vid
      output:1

  bridge("br-physnet0")
  ---------------------
   0. in_port=1,dl_vlan=4, priority 4, cookie 0x85bc1a5077d54d3f
      set_field:4196->vlan_vid
      NORMAL
       -> forwarding to learned port

  Final flow: reg5=0x2d,reg6=0x4,in_port=45,dl_vlan=4,dl_vlan_pcp=0,dl_vlan1=0,dl_vlan_pcp1=0,dl_src=fa:16:3e:96:58:ab,dl_dst=82:e8:18:67:7e:40,dl_type=0x0000
  Megaflow: recirc_id=0,eth,in_port=45,dl_vlan=0,dl_vlan_pcp=0,dl_src=fa:16:3e:96:58:ab,dl_dst=82:e8:18:67:7e:40,dl_type=0x0000
  Datapath actions: pop_vlan,push_vlan(vid=100,pcp=0),1

  # port and physnet bridge mac in all fdbs, ingress == physnet bridge -> vnic
  # in br-int we have the normal action flooding, despite the ongoing ping
  $ sudo ovs-appctl ofproto/trace br-physnet0 in_port=LOCAL,dl_vlan=100,dl_src=$( ifdata -ph br-physnet0 ),dl_dst=$( openstack port show port0 -f value -c mac_address )
  Flow: in_port=LOCAL,dl_vlan=100,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=82:e8:18:67:7e:40,dl_dst=fa:16:3e:96:58:ab,dl_type=0x0000

  bridge("br-physnet0")
  ---------------------
   0. priority 0, cookie 0x85bc1a5077d54d3f
      NORMAL
       -> forwarding to learned port

  bridge("br-int")
  ----------------
   0. in_port=1,dl_vlan=100, priority 3, cookie 0x2b36d6b4a42fe7b5
      set_field:4100->vlan_vid
      goto_table:58
  58. priority 0, cookie 0x2b36d6b4a42fe7b5
      goto_table:60
  60. priority 3, cookie 0x2b36d6b4a42fe7b5
      NORMAL
       -> no learned MAC for destination, flooding

  bridge("br-tun")
  ----------------
   0. in_port=1, priority 1, cookie 0xc8cfff9c6bbea88d
      goto_table:2
   2. dl_dst=00:00:00:00:00:00/01:00:00:00:00:00, priority 0, cookie 0xc8cfff9c6bbea88d
      goto_table:20
  20. priority 0, cookie 0xc8cfff9c6bbea88d
      goto_table:22
  22. priority 0, cookie 0xc8cfff9c6bbea88d
      drop

  Final flow: unchanged
  Megaflow: recirc_id=0,eth,in_port=LOCAL,dl_vlan=100,dl_vlan_pcp=0,dl_src=82:e8:18:67:7e:40,dl_dst=fa:16:3e:96:58:ab,dl_type=0x0000
  Datapath actions: pop_vlan,push_vlan(vid=4,pcp=0),8,13,pop_vlan,9,11

  This bug has a long history:

  round #1 - some unnecessary flooding in the egress direction
  https://bugs.launchpad.net/neutron/+bug/1732067
  https://bugs.launchpad.net/neutron/+bug/1841622
  fix introducing explicitly_egress_direct:
  https://review.opendev.org/c/openstack/neutron/+/666991

  round #2 - the fix above introduced some unnecessary ingress flooding
  https://bugs.launchpad.net/neutron/+bug/1884708
  fix for firewall_driver=noop
  https://review.opendev.org/c/openstack/neutron/+/738551
  also related:
  https://bugs.launchpad.net/neutron/+bug/1732067/comments/50
  https://bugs.launchpad.net/neutron/+bug/1732067/comments/79
  may be related:
  https://bugs.launchpad.net/neutron/+bug/1866445

  round #3 (today)
  https://bugs.launchpad.net/neutron/+bug/2048785/comments/2
  https://bugs.launchpad.net/neutron/+bug/1884708/comments/29

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2051351/+subscriptions