yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #78060
[Bug 1825147] [NEW] ovs flooding packets, not learning MAC addresses
Public bug reported:
Hi,
Using OpenStack rocky on Ubuntu 18.04, with dvr_snat and L3HA, and using
the openvswitch firewall driver. openvswitch version
2.10.0-0ubuntu2~cloud0. Deployed with juju.
I was doing load testing by creating a bunch of instances, and noticed
that the network throughput available to instances dropped dramatically
as I was creating VMs. In other words, with 2 VMs on my cloud, I had
pretty good bandwith, but with 100 (idle) VMs, bandwidth became
ridiculously slow.
Investigating the problem, I noticed that ovs was flooding traffic : all
instances of an hypervisor were getting all the traffic destined to any
VM on another hypervisor.
In other words, I had vmA1 and vmA2 on hypervisor A, and vmB1 on
hypervisor B, then TCP traffic between vmA1 and vmB1 could be seen on
vmA2.
Digging more into this, I think I located the problem in the ovs MAC
learning process, more specifically on br-int (using "sudo ovs-appctl
fdb/show br-int").
Traffic flow from vmA1 to vmB1, on hypervisor A, looks like : tap (on
br-int), patch-tun (on br-int), patch-int (on br-tun), vxlan to
hypervisor B.
So whenever traffic comes back (the other way around), the MAC address
of vmB1 should be learned, on br-int, on the patch-tun port - and that
is not the case. So whenever vmA1 sends traffic to vmB1, at some point
it reaches the "NORMAL" action, and since the destination MAC is not
learned, traffic is getting flooded : see ofproto/trace
https://pastebin.ubuntu.com/p/mbrrj4wPxY/ (see "no learned MAC for
destination, flooding")
Digging more into this, it would appear that ovs learns a MAC address
only from broadcast ARP requests, and not from ARP requests with a
unicast MAC address (which is what Linux uses after a successful
broadcast ARP request) : https://pastebin.ubuntu.com/p/Sfq775cX6V/.
Once the MAC is learned, there's no more flooding :
https://pastebin.ubuntu.com/p/bBNHrRKndg/ (see "forwarding to learned
port" instead of "no learned MAC for destination, flooding").
Flooding has security consequences (VMs can see traffic not destined to
them - although only traffic for VMs in the same neutron network), and
performance consequences, so it should be avoided.
Thanks
** Affects: neutron
Importance: Undecided
Status: New
** Affects: neutron (Ubuntu)
Importance: Undecided
Status: New
** Also affects: neutron (Ubuntu)
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1825147
Title:
ovs flooding packets, not learning MAC addresses
Status in neutron:
New
Status in neutron package in Ubuntu:
New
Bug description:
Hi,
Using OpenStack rocky on Ubuntu 18.04, with dvr_snat and L3HA, and
using the openvswitch firewall driver. openvswitch version
2.10.0-0ubuntu2~cloud0. Deployed with juju.
I was doing load testing by creating a bunch of instances, and noticed
that the network throughput available to instances dropped
dramatically as I was creating VMs. In other words, with 2 VMs on my
cloud, I had pretty good bandwith, but with 100 (idle) VMs, bandwidth
became ridiculously slow.
Investigating the problem, I noticed that ovs was flooding traffic :
all instances of an hypervisor were getting all the traffic destined
to any VM on another hypervisor.
In other words, I had vmA1 and vmA2 on hypervisor A, and vmB1 on
hypervisor B, then TCP traffic between vmA1 and vmB1 could be seen on
vmA2.
Digging more into this, I think I located the problem in the ovs MAC
learning process, more specifically on br-int (using "sudo ovs-appctl
fdb/show br-int").
Traffic flow from vmA1 to vmB1, on hypervisor A, looks like : tap (on
br-int), patch-tun (on br-int), patch-int (on br-tun), vxlan to
hypervisor B.
So whenever traffic comes back (the other way around), the MAC address
of vmB1 should be learned, on br-int, on the patch-tun port - and that
is not the case. So whenever vmA1 sends traffic to vmB1, at some point
it reaches the "NORMAL" action, and since the destination MAC is not
learned, traffic is getting flooded : see ofproto/trace
https://pastebin.ubuntu.com/p/mbrrj4wPxY/ (see "no learned MAC for
destination, flooding")
Digging more into this, it would appear that ovs learns a MAC address
only from broadcast ARP requests, and not from ARP requests with a
unicast MAC address (which is what Linux uses after a successful
broadcast ARP request) : https://pastebin.ubuntu.com/p/Sfq775cX6V/.
Once the MAC is learned, there's no more flooding :
https://pastebin.ubuntu.com/p/bBNHrRKndg/ (see "forwarding to learned
port" instead of "no learned MAC for destination, flooding").
Flooding has security consequences (VMs can see traffic not destined
to them - although only traffic for VMs in the same neutron network),
and performance consequences, so it should be avoided.
Thanks
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1825147/+subscriptions