← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1960006] [NEW] [ovn] Stale ports in the OVN database at churn

 

Public bug reported:

There are situations where, under a lot of control plane activity, OVN
ports will stale and won't get cleaned up (unless the neutron-ovn-db-
sync tool is run manually).

A possible scenario for this is:

a) Port creation
  a.1) Port created in Neutron DB
  a.b) Port created in OVN Northbound (NB) database.
  a.c) NB ovsdb-server will notify of the port creation to all the connected workers
  a.d) Each worker will eventually process this event and update their in-memory copy of the NB database  

Immediately, the port gets deleted via API but the previous a.d) step
hasn't been completed by all workers. Then the port deletion API request
falls into one of those workers that haven't yet updated their in-memory
OVN NB database copy with the newly created port.


b) Port deletion
  b.1) Port deleted from Neutron DB
  b.2) Port attempted to be deleted from OVN NB but lookup fails and its revision number is deleted [0]


At this point, the port will stale forever in the OVN database causing other issues that we have mitigated (eg. [1]) but ultimately the number of OVN resources may grow to a point that can affect very negatively to the overall cluster stability and performance.

A potential workaround to this problem might be to run the neutron-ovn-
db-sync tool periodically to get rid of those but it is not recommended
to do so while the API is operational.


[0] https://github.com/openstack/neutron/blob/f5030b0bc25216d80b09f7ac3938c9a902b655e3/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L698
[1] https://bugs.launchpad.net/neutron/+bug/1874733

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1960006

Title:
  [ovn] Stale ports in the OVN database at churn

Status in neutron:
  New

Bug description:
  There are situations where, under a lot of control plane activity, OVN
  ports will stale and won't get cleaned up (unless the neutron-ovn-db-
  sync tool is run manually).

  A possible scenario for this is:

  a) Port creation
    a.1) Port created in Neutron DB
    a.b) Port created in OVN Northbound (NB) database.
    a.c) NB ovsdb-server will notify of the port creation to all the connected workers
    a.d) Each worker will eventually process this event and update their in-memory copy of the NB database  

  Immediately, the port gets deleted via API but the previous a.d) step
  hasn't been completed by all workers. Then the port deletion API
  request falls into one of those workers that haven't yet updated their
  in-memory OVN NB database copy with the newly created port.

  
  b) Port deletion
    b.1) Port deleted from Neutron DB
    b.2) Port attempted to be deleted from OVN NB but lookup fails and its revision number is deleted [0]

  
  At this point, the port will stale forever in the OVN database causing other issues that we have mitigated (eg. [1]) but ultimately the number of OVN resources may grow to a point that can affect very negatively to the overall cluster stability and performance.

  A potential workaround to this problem might be to run the neutron-
  ovn-db-sync tool periodically to get rid of those but it is not
  recommended to do so while the API is operational.


  [0] https://github.com/openstack/neutron/blob/f5030b0bc25216d80b09f7ac3938c9a902b655e3/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L698
  [1] https://bugs.launchpad.net/neutron/+bug/1874733

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1960006/+subscriptions



Follow ups