yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #90517
[Bug 1869244] Re: RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X when using trunk bridges with DPDK vhostusermode
Reviewed: https://review.opendev.org/c/openstack/neutron/+/837780
Committed: https://opendev.org/openstack/neutron/commit/33de608f04dcc8117eeba63876598dc2ae93013a
Submitter: "Zuul (22348)"
Branch: master
commit 33de608f04dcc8117eeba63876598dc2ae93013a
Author: Miguel Lavalle <mlavalle@xxxxxxxxxx>
Date: Wed Apr 13 18:00:12 2022 -0500
Avoid race condition when deleting trunk bridges
Prior to this change, trunk bridges are created by os-vif but deleted
by Neutron when the last vif is removed from it. This creates race
conditions in some use cases, like DPDK with vhostuserclient mode, when
VMs are rebooted. To avoid these races, Neutron will not delete trunk
bridges anymore. Their creation and deletion will be os-vif's
responsiblity. Since [1], Nova uses the os-vif version that contains
this functionality.
This patch also changes the trunk status change event. During a live
migration, when the trunk parent port has been bound to the destination
host (that means there is only one port binding associated) and the
status has changed to ACTIVE, the method triggers the subport binding
to the new host too. This is because there could be a race condition
between the subport binding, triggered by the OVS agent, and the parent
port binding, triggered by Nova. If when the OVS agent tries to bind the
subports, the parent port is still bound to the source host, the subport
binding remains in the source host too, instead of changing to the
destination.
This patch also reverts [2] and [3]. As commented in the previous
paragraph, this patch fixes the issue reported in LP#1997025. The trunk
port live migration with ML2/OVS must be fixed with this patch.
[1]https://review.opendev.org/c/openstack/nova/+/865031
[2]https://review.opendev.org/c/openstack/neutron/+/865295
[3]https://review.opendev.org/c/openstack/neutron/+/865424
Closes-Bug: #1869244
Closes-Bug: #1997025
Change-Id: I4e16357f3ff214fcf41e418982806c24088a2665
** Changed in: neutron
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1869244
Title:
RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X when using
trunk bridges with DPDK vhostusermode
Status in neutron:
Fix Released
Bug description:
DPDK vhostuser mode (DPDK/vhu) means that when an instance is powered
off the port is deleted, and when an instance is powered on a port is
created. This means a reboot is functionally a super fast
delete-then-create. Neutron trunking mode in combination with DPDK/vhu
implements a trunk bridge for each tenant, and the ports for the
instances are created as subports of that bridge. The standard way a
trunk bridge works is that when all the subports are deleted, a thread
is spawned to delete the trunk bridge, because that is an expensive and
time-consuming operation. That means that if the port in question is
the only port on the trunk on that compute node, this happens:
1. The port is deleted
2. A thread is spawned to delete the trunk
3. The port is recreated
If the trunk is deleted after #3 happens then the instance has no
networking and is inaccessible; this is the scenario that was dealt with
in a previous change [1]. But there continue to be issues with errors
"RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X".
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command [-] Error executing command: RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command Traceback (most recent call last):
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/command.py", line 37, in execute
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command self.run_idl(None)
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command File "/usr/lib/python2.7/site-packages/ovsdbapp/schema/open_vswitch/commands.py", line 335, in run_idl
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command br = idlutils.row_by_value(self.api.idl, 'Bridge', 'name', self.bridge)
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/idlutils.py", line 63, in row_by_value
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command raise RowNotFound(table=table, col=column, match=match)
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command
2020-03-02 10:37:45.932 6278 ERROR neutron.services.trunk.drivers.openvswitch.agent.ovsdb_handler [-] Cannot obtain interface list for bridge tbr-XXXXXXXX-X: Cannot find Bridge with name=tbr-XXXXXXXX-X: RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X
What I believe is happening in this case is that the trunk is being
deleted in the middle of the execution of #3, so that it stops
existing in the middle of the port creation logic but before the
port is actually recreated.
This issue was observed in setups running Queens.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1869244/+subscriptions
References