← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1960006] Re: [ovn] Stale ports in the OVN database at churn

 

Reviewed:  https://review.opendev.org/c/openstack/neutron/+/827834
Committed: https://opendev.org/openstack/neutron/commit/be7331c8169c53e3900c9c1a08e12808cf5ed2ec
Submitter: "Zuul (22348)"
Branch:    master

commit be7331c8169c53e3900c9c1a08e12808cf5ed2ec
Author: Daniel Alvarez Sanchez <dalvarez@xxxxxxxxxx>
Date:   Fri Feb 4 11:32:47 2022 +0100

    [ovn] Prevent stale ports in the OVN database
    
    Under a lot of load, there can be situations where all the Neutron
    workers have not updated their in-memory copy of the NB database
    in time before certain operations.
    
    This scenario can lead to stale resources when a somewhat recently
    created port is attempted to be deleted, but the worker handling
    this deletion doesn't know about the OVN port yet.
    
    This patch detects this condition and allows some time (at least one
    maintenance task cycle) before it deletes the OVN revision number.
    If the port then shows up in the OVN database within that window, then
    it will be deleted later by the maintenance task avoiding the stale
    ports. If not, the revision number row will be deleted and we won't
    stale these entries either.
    
    Closes-Bug: #1960006
    Signed-off-by: Daniel Alvarez Sanchez <dalvarez@xxxxxxxxxx>
    Change-Id: Ie4093dc6cd63b89e3a62363a4f805ef8287d15b9


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1960006

Title:
  [ovn] Stale ports in the OVN database at churn

Status in neutron:
  Fix Released

Bug description:
  There are situations where, under a lot of control plane activity, OVN
  ports will stale and won't get cleaned up (unless the neutron-ovn-db-
  sync tool is run manually).

  A possible scenario for this is:

  a) Port creation
    a.1) Port created in Neutron DB
    a.b) Port created in OVN Northbound (NB) database.
    a.c) NB ovsdb-server will notify of the port creation to all the connected workers
    a.d) Each worker will eventually process this event and update their in-memory copy of the NB database  

  Immediately, the port gets deleted via API but the previous a.d) step
  hasn't been completed by all workers. Then the port deletion API
  request falls into one of those workers that haven't yet updated their
  in-memory OVN NB database copy with the newly created port.

  
  b) Port deletion
    b.1) Port deleted from Neutron DB
    b.2) Port attempted to be deleted from OVN NB but lookup fails and its revision number is deleted [0]

  
  At this point, the port will stale forever in the OVN database causing other issues that we have mitigated (eg. [1]) but ultimately the number of OVN resources may grow to a point that can affect very negatively to the overall cluster stability and performance.

  A potential workaround to this problem might be to run the neutron-
  ovn-db-sync tool periodically to get rid of those but it is not
  recommended to do so while the API is operational.


  [0] https://github.com/openstack/neutron/blob/f5030b0bc25216d80b09f7ac3938c9a902b655e3/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L698
  [1] https://bugs.launchpad.net/neutron/+bug/1874733

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1960006/+subscriptions



References