← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1869244] Re: RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X when using trunk bridges with DPDK vhostusermode

 

Reviewed:  https://review.opendev.org/c/openstack/neutron/+/837780
Committed: https://opendev.org/openstack/neutron/commit/33de608f04dcc8117eeba63876598dc2ae93013a
Submitter: "Zuul (22348)"
Branch:    master

commit 33de608f04dcc8117eeba63876598dc2ae93013a
Author: Miguel Lavalle <mlavalle@xxxxxxxxxx>
Date:   Wed Apr 13 18:00:12 2022 -0500

    Avoid race condition when deleting trunk bridges
    
    Prior to this change, trunk bridges are created by os-vif but deleted
    by Neutron when the last vif is removed from it. This creates race
    conditions in some use cases, like DPDK with vhostuserclient mode, when
    VMs are rebooted. To avoid these races, Neutron will not delete trunk
    bridges anymore. Their creation and deletion will be os-vif's
    responsiblity. Since [1], Nova uses the os-vif version that contains
    this functionality.
    
    This patch also changes the trunk status change event. During a live
    migration, when the trunk parent port has been bound to the destination
    host (that means there is only one port binding associated) and the
    status has changed to ACTIVE, the method triggers the subport binding
    to the new host too. This is because there could be a race condition
    between the subport binding, triggered by the OVS agent, and the parent
    port binding, triggered by Nova. If when the OVS agent tries to bind the
    subports, the parent port is still bound to the source host, the subport
    binding remains in the source host too, instead of changing to the
    destination.
    
    This patch also reverts [2] and [3]. As commented in the previous
    paragraph, this patch fixes the issue reported in LP#1997025. The trunk
    port live migration with ML2/OVS must be fixed with this patch.
    
    [1]https://review.opendev.org/c/openstack/nova/+/865031
    [2]https://review.opendev.org/c/openstack/neutron/+/865295
    [3]https://review.opendev.org/c/openstack/neutron/+/865424
    
    Closes-Bug: #1869244
    Closes-Bug: #1997025
    
    Change-Id: I4e16357f3ff214fcf41e418982806c24088a2665


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1869244

Title:
  RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X when using
  trunk bridges with DPDK vhostusermode

Status in neutron:
  Fix Released

Bug description:
  DPDK vhostuser mode (DPDK/vhu) means that when an instance is powered
  off the port is deleted, and when an instance is powered on a port is
  created.  This means a reboot is functionally a super fast
  delete-then-create.  Neutron trunking mode in combination with DPDK/vhu
  implements a trunk bridge for each tenant, and the ports for the
  instances are created as subports of that bridge.  The standard way a
  trunk bridge works is that when all the subports are deleted, a thread
  is spawned to delete the trunk bridge, because that is an expensive and
  time-consuming operation.  That means that if the port in question is
  the only port on the trunk on that compute node, this happens:

  1. The port is deleted
  2. A thread is spawned to delete the trunk
  3. The port is recreated

  If the trunk is deleted after #3 happens then the instance has no
  networking and is inaccessible; this is the scenario that was dealt with
  in a previous change [1].  But there continue to be issues with errors
  "RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X".  

  2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command [-] Error executing command: RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X
  2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command Traceback (most recent call last):
  2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/command.py", line 37, in execute
  2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command     self.run_idl(None)
  2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python2.7/site-packages/ovsdbapp/schema/open_vswitch/commands.py", line 335, in run_idl
  2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command     br = idlutils.row_by_value(self.api.idl, 'Bridge', 'name', self.bridge)
  2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/idlutils.py", line 63, in row_by_value
  2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command     raise RowNotFound(table=table, col=column, match=match)
  2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X
  2020-03-02 10:37:45.929 6278 ERROR ovsdbapp.backend.ovs_idl.command 
  2020-03-02 10:37:45.932 6278 ERROR neutron.services.trunk.drivers.openvswitch.agent.ovsdb_handler [-] Cannot obtain interface list for bridge tbr-XXXXXXXX-X: Cannot find Bridge with name=tbr-XXXXXXXX-X: RowNotFound: Cannot find Bridge with name=tbr-XXXXXXXX-X

  
  What I believe is happening in this case is that the trunk is being 
  deleted in the middle of the execution of #3, so that it stops 
  existing in the middle of the port creation logic but before the 
  port is actually recreated.

  This issue was observed in setups running Queens.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1869244/+subscriptions



References