yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #82067
[Bug 1869244] [NEW] RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X when using trunk bridges with DPDK vhostusermode
Public bug reported:
DPDK vhostuser mode (DPDK/vhu) means that when an instance is powered
off the port is deleted, and when an instance is powered on a port is
created. This means a reboot is functionally a super fast
delete-then-create. Neutron trunking mode in combination with DPDK/vhu
implements a trunk bridge for each tenant, and the ports for the
instances are created as subports of that bridge. The standard way a
trunk bridge works is that when all the subports are deleted, a thread
is spawned to delete the trunk bridge, because that is an expensive and
time-consuming operation. That means that if the port in question is
the only port on the trunk on that compute node, this happens:
1. The port is deleted
2. A thread is spawned to delete the trunk
3. The port is recreated
If the trunk is deleted after #3 happens then the instance has no
networking and is inaccessible; this is the scenario that was dealt with
in a previous change [1]. But there continue to be issues with errors
"RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X".
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command [-] Error executing command: RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command Traceback (most recent call last):
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/command.py", line 37, in execute
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command self.run_idl(None)
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command File "/usr/lib/python2.7/site-packages/ovsdbapp/schema/open_vswitch/commands.py", line 335, in run_idl
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command br = idlutils.row_by_value(self.api.idl, 'Bridge', 'name', self.bridge)
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/idlutils.py", line 63, in row_by_value
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command raise RowNotFound(table=table, col=column, match=match)
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command
2020-03-02 10:37:45.932 6278 ERROR neutron.services.trunk.drivers.openvswitch.agent.ovsdb_handler [-] Cannot obtain interface list for bridge tbr-XXXXXXXX-X: Cannot find Bridge with name=tbr-XXXXXXXX-X: RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X
What I believe is happening in this case is that the trunk is being
deleted in the middle of the execution of #3, so that it stops
existing in the middle of the port creation logic but before the
port is actually recreated.
This issue was observed in setups running Queens.
** Affects: neutron
Importance: Undecided
Assignee: Nate Johnston (nate-johnston)
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1869244
Title:
RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X when using
trunk bridges with DPDK vhostusermode
Status in neutron:
New
Bug description:
DPDK vhostuser mode (DPDK/vhu) means that when an instance is powered
off the port is deleted, and when an instance is powered on a port is
created. This means a reboot is functionally a super fast
delete-then-create. Neutron trunking mode in combination with DPDK/vhu
implements a trunk bridge for each tenant, and the ports for the
instances are created as subports of that bridge. The standard way a
trunk bridge works is that when all the subports are deleted, a thread
is spawned to delete the trunk bridge, because that is an expensive and
time-consuming operation. That means that if the port in question is
the only port on the trunk on that compute node, this happens:
1. The port is deleted
2. A thread is spawned to delete the trunk
3. The port is recreated
If the trunk is deleted after #3 happens then the instance has no
networking and is inaccessible; this is the scenario that was dealt with
in a previous change [1]. But there continue to be issues with errors
"RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X".
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command [-] Error executing command: RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command Traceback (most recent call last):
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/command.py", line 37, in execute
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command self.run_idl(None)
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command File "/usr/lib/python2.7/site-packages/ovsdbapp/schema/open_vswitch/commands.py", line 335, in run_idl
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command br = idlutils.row_by_value(self.api.idl, 'Bridge', 'name', self.bridge)
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/idlutils.py", line 63, in row_by_value
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command raise RowNotFound(table=table, col=column, match=match)
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command
2020-03-02 10:37:45.932 6278 ERROR neutron.services.trunk.drivers.openvswitch.agent.ovsdb_handler [-] Cannot obtain interface list for bridge tbr-XXXXXXXX-X: Cannot find Bridge with name=tbr-XXXXXXXX-X: RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X
What I believe is happening in this case is that the trunk is being
deleted in the middle of the execution of #3, so that it stops
existing in the middle of the port creation logic but before the
port is actually recreated.
This issue was observed in setups running Queens.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1869244/+subscriptions
Follow ups