yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #89747
[Bug 1990174] Re: [OVN] Deadlock when starting neutron server, during the OVN hash ring deletion
Reviewed: https://review.opendev.org/c/openstack/neutron/+/858542
Committed: https://opendev.org/openstack/neutron/commit/819a1bb3e6f3b10a1887e2ef836c138e02f8b996
Submitter: "Zuul (22348)"
Branch: master
commit 819a1bb3e6f3b10a1887e2ef836c138e02f8b996
Author: Rodolfo Alonso Hernandez <ralonsoh@xxxxxxxxxx>
Date: Tue Sep 20 13:32:04 2022 +0200
Move the "ovn_hash_ring" clean up to maintenance worker
The "ovn_hash_ring" procedure to clean up the stale/old registers
is now executed on the ``HashRingHealthCheckPeriodics`` class, tha
is executed on the ``MaintenanceWorker`` process.
In a HA scenario, if several servers are rebooted at the same time,
the "ovn_hash_ring" clean up operation can clash with API worker
method "_load_hash_ring", that executed a SQL read from this table.
In some high loaded environments, if the OVN database takes time
to be locally cached, this read operation is executed thousand of
times; basically any time an OVN database event occurs.
In order to avoid/skip a deadlock when deleting the "ovn_hash_ring"
table, this clean up is executed in a periodic task. If this task
succeeds, the task is stopped. If the task raises a database
exception, it is processed again.
Now the "ovn_hash_ring" registers are retrieved using the
"created_at" time as a filter. The initial time is taken when the
OVN mechanism driver is initilized, before any API worker is spawned
and any new "ovn_hash_ring" register has been created (an API
worker, when started, will create a new "ovn_hash_ring" register).
Any stale/old register stored in this table will be ignored; that
means any register created before the OVN mechanism driver was
started.
Closes-Bug: #1990174
Change-Id: I07c4cb6e20b8a84e4ace7a8e34555aced5b5da9f
** Changed in: neutron
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1990174
Title:
[OVN] Deadlock when starting neutron server, during the OVN hash ring
deletion
Status in neutron:
Fix Released
Bug description:
Related bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2125842
Description of problem:
Neutron server often fails to start and systemd needs to restart it. This is a problem at scale because all workers need to reconnect again to the OVN DBs.
How reproducible:
50%
Steps to Reproduce:
1. Start neutron server
Error log: https://paste.opendev.org/show/bm3jZZ1oWX7ihK8JXzdE/
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1990174/+subscriptions
References