← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1869244] [NEW] RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X when using trunk bridges with DPDK vhostusermode

 

Public bug reported:

DPDK vhostuser mode (DPDK/vhu) means that when an instance is powered
off the port is deleted, and when an instance is powered on a port is
created.  This means a reboot is functionally a super fast
delete-then-create.  Neutron trunking mode in combination with DPDK/vhu
implements a trunk bridge for each tenant, and the ports for the
instances are created as subports of that bridge.  The standard way a
trunk bridge works is that when all the subports are deleted, a thread
is spawned to delete the trunk bridge, because that is an expensive and
time-consuming operation.  That means that if the port in question is
the only port on the trunk on that compute node, this happens:

1. The port is deleted
2. A thread is spawned to delete the trunk
3. The port is recreated

If the trunk is deleted after #3 happens then the instance has no
networking and is inaccessible; this is the scenario that was dealt with
in a previous change [1].  But there continue to be issues with errors
"RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X".  

2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command [-] Error executing command: RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command Traceback (most recent call last):
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/command.py", line 37, in execute
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command     self.run_idl(None)
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python2.7/site-packages/ovsdbapp/schema/open_vswitch/commands.py", line 335, in run_idl
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command     br = idlutils.row_by_value(self.api.idl, 'Bridge', 'name', self.bridge)
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/idlutils.py", line 63, in row_by_value
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command     raise RowNotFound(table=table, col=column, match=match)
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X
2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command 
2020-03-02 10:37:45.932 6278 ERROR neutron.services.trunk.drivers.openvswitch.agent.ovsdb_handler [-] Cannot obtain interface list for bridge tbr-XXXXXXXX-X: Cannot find Bridge with name=tbr-XXXXXXXX-X: RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X


What I believe is happening in this case is that the trunk is being 
deleted in the middle of the execution of #3, so that it stops 
existing in the middle of the port creation logic but before the 
port is actually recreated.

This issue was observed in setups running Queens.

** Affects: neutron
     Importance: Undecided
     Assignee: Nate Johnston (nate-johnston)
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1869244

Title:
  RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X when using
  trunk bridges with DPDK vhostusermode

Status in neutron:
  New

Bug description:
  DPDK vhostuser mode (DPDK/vhu) means that when an instance is powered
  off the port is deleted, and when an instance is powered on a port is
  created.  This means a reboot is functionally a super fast
  delete-then-create.  Neutron trunking mode in combination with DPDK/vhu
  implements a trunk bridge for each tenant, and the ports for the
  instances are created as subports of that bridge.  The standard way a
  trunk bridge works is that when all the subports are deleted, a thread
  is spawned to delete the trunk bridge, because that is an expensive and
  time-consuming operation.  That means that if the port in question is
  the only port on the trunk on that compute node, this happens:

  1. The port is deleted
  2. A thread is spawned to delete the trunk
  3. The port is recreated

  If the trunk is deleted after #3 happens then the instance has no
  networking and is inaccessible; this is the scenario that was dealt with
  in a previous change [1].  But there continue to be issues with errors
  "RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X".  

  2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command [-] Error executing command: RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X
  2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command Traceback (most recent call last):
  2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/command.py", line 37, in execute
  2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command     self.run_idl(None)
  2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python2.7/site-packages/ovsdbapp/schema/open_vswitch/commands.py", line 335, in run_idl
  2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command     br = idlutils.row_by_value(self.api.idl, 'Bridge', 'name', self.bridge)
  2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/idlutils.py", line 63, in row_by_value
  2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command     raise RowNotFound(table=table, col=column, match=match)
  2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X
  2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command 
  2020-03-02 10:37:45.932 6278 ERROR neutron.services.trunk.drivers.openvswitch.agent.ovsdb_handler [-] Cannot obtain interface list for bridge tbr-XXXXXXXX-X: Cannot find Bridge with name=tbr-XXXXXXXX-X: RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X

  
  What I believe is happening in this case is that the trunk is being 
  deleted in the middle of the execution of #3, so that it stops 
  existing in the middle of the port creation logic but before the 
  port is actually recreated.

  This issue was observed in setups running Queens.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1869244/+subscriptions


Follow ups