yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #88201
[Bug 1960006] [NEW] [ovn] Stale ports in the OVN database at churn
Public bug reported:
There are situations where, under a lot of control plane activity, OVN
ports will stale and won't get cleaned up (unless the neutron-ovn-db-
sync tool is run manually).
A possible scenario for this is:
a) Port creation
a.1) Port created in Neutron DB
a.b) Port created in OVN Northbound (NB) database.
a.c) NB ovsdb-server will notify of the port creation to all the connected workers
a.d) Each worker will eventually process this event and update their in-memory copy of the NB database
Immediately, the port gets deleted via API but the previous a.d) step
hasn't been completed by all workers. Then the port deletion API request
falls into one of those workers that haven't yet updated their in-memory
OVN NB database copy with the newly created port.
b) Port deletion
b.1) Port deleted from Neutron DB
b.2) Port attempted to be deleted from OVN NB but lookup fails and its revision number is deleted [0]
At this point, the port will stale forever in the OVN database causing other issues that we have mitigated (eg. [1]) but ultimately the number of OVN resources may grow to a point that can affect very negatively to the overall cluster stability and performance.
A potential workaround to this problem might be to run the neutron-ovn-
db-sync tool periodically to get rid of those but it is not recommended
to do so while the API is operational.
[0] https://github.com/openstack/neutron/blob/f5030b0bc25216d80b09f7ac3938c9a902b655e3/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L698
[1] https://bugs.launchpad.net/neutron/+bug/1874733
** Affects: neutron
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1960006
Title:
[ovn] Stale ports in the OVN database at churn
Status in neutron:
New
Bug description:
There are situations where, under a lot of control plane activity, OVN
ports will stale and won't get cleaned up (unless the neutron-ovn-db-
sync tool is run manually).
A possible scenario for this is:
a) Port creation
a.1) Port created in Neutron DB
a.b) Port created in OVN Northbound (NB) database.
a.c) NB ovsdb-server will notify of the port creation to all the connected workers
a.d) Each worker will eventually process this event and update their in-memory copy of the NB database
Immediately, the port gets deleted via API but the previous a.d) step
hasn't been completed by all workers. Then the port deletion API
request falls into one of those workers that haven't yet updated their
in-memory OVN NB database copy with the newly created port.
b) Port deletion
b.1) Port deleted from Neutron DB
b.2) Port attempted to be deleted from OVN NB but lookup fails and its revision number is deleted [0]
At this point, the port will stale forever in the OVN database causing other issues that we have mitigated (eg. [1]) but ultimately the number of OVN resources may grow to a point that can affect very negatively to the overall cluster stability and performance.
A potential workaround to this problem might be to run the neutron-
ovn-db-sync tool periodically to get rid of those but it is not
recommended to do so while the API is operational.
[0] https://github.com/openstack/neutron/blob/f5030b0bc25216d80b09f7ac3938c9a902b655e3/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L698
[1] https://bugs.launchpad.net/neutron/+bug/1874733
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1960006/+subscriptions
Follow ups