yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #75207
[Bug 1783654] Re: DVR process flow not installed on physical bridge for shared tenant network
Regression testing successful for queens-proposed (tempest results):
======
Totals
======
Ran: 92 tests in 1000.6584 sec.
- Passed: 84
- Skipped: 8
- Expected Fail: 0
- Unexpected Success: 0
- Failed: 0
Sum of execute time for each test: 465.0920 sec.
** Changed in: cloud-archive/rocky
Status: Fix Committed => Fix Released
** Tags removed: verification-queens-needed
** Tags added: verification-queens-done
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1783654
Title:
DVR process flow not installed on physical bridge for shared tenant
network
Status in Ubuntu Cloud Archive:
Fix Released
Status in Ubuntu Cloud Archive pike series:
Invalid
Status in Ubuntu Cloud Archive queens series:
Fix Committed
Status in Ubuntu Cloud Archive rocky series:
Fix Released
Status in neutron:
Fix Released
Status in neutron package in Ubuntu:
Fix Released
Status in neutron source package in Bionic:
Fix Committed
Status in neutron source package in Cosmic:
Fix Released
Bug description:
Seems like collateral from
https://bugs.launchpad.net/neutron/+bug/1751396
In DVR, the distributed gateway port's IP and MAC are shared in the
qrouter across all hosts.
The dvr_process_flow on the physical bridge (which replaces the shared
router_distributed MAC address with the unique per-host MAC when its
the source), is missing, and so is the drop rule which instructs the
bridge to drop all traffic destined for the shared distributed MAC.
Because of this, we are seeing the router MAC on the network
infrastructure, causing it on flap on br-int on every compute host:
root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
11 4 fa:16:3e:42:a2:ec 1
root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
11 4 fa:16:3e:42:a2:ec 2
root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
1 4 fa:16:3e:42:a2:ec 0
root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
11 4 fa:16:3e:42:a2:ec 0
root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
11 4 fa:16:3e:42:a2:ec 0
root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
1 4 fa:16:3e:42:a2:ec 0
root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
1 4 fa:16:3e:42:a2:ec 0
root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
1 4 fa:16:3e:42:a2:ec 0
root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
1 4 fa:16:3e:42:a2:ec 1
root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
11 4 fa:16:3e:42:a2:ec 0
root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
11 4 fa:16:3e:42:a2:ec 0
root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
11 4 fa:16:3e:42:a2:ec 0
Where port 1 is phy-br-vlan, connecting to the physical bridge, and
port 11 is the correct local qr-interface. Because these dvr flows are
missing on br-vlan, pkts w/ source mac ingress into the host and br-
int learns it upstream.
The symptom is when pinging a VM's floating IP, we see occasional
packet loss (10-30%), and sometimes the responses are sent upstream by
br-int instead of the qrouter, so the ICMP replies come with fixed IP
of the replier since no NAT'ing took place, and on the tenant network
rather than external network.
When I force net_shared_only to False here, the problem goes away:
https://github.com/openstack/neutron/blob/stable/pike/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L436
It should we noted we *ONLY* need to do this on our dvr_snat host. The
dvr process's are missing on every compute host. But if we shut
qrouter on the snat host, FIP functionality works and DVR mac stops
flapping on others. Or if we apply fix only to snat host, it works.
Perhaps there is something on SNAT node that is unique
Ubuntu SRU details:
-------------------
[Impact]
See above
[Test Case]
Deploy OpenStack with dvr enabled and then follow the steps above.
[Regression Potential]
The patches that are backported have already landed upstream in the corresponding stable branches, helping to minimize any regression potential.
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1783654/+subscriptions
References