yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #95302
[Bug 2093347] Re: [ovn-octavia-provider] first request is stuck on OVN txn
Reviewed: https://review.opendev.org/c/openstack/ovn-octavia-provider/+/938797
Committed: https://opendev.org/openstack/ovn-octavia-provider/commit/26431c9ab159032f9122d3f6fe6be95798ad0497
Submitter: "Zuul (22348)"
Branch: master
commit 26431c9ab159032f9122d3f6fe6be95798ad0497
Author: Fernando Royo <froyo@xxxxxxxxxx>
Date: Thu Jan 9 10:50:09 2025 +0100
Remove join on helper request daemon thread
This patch remove the join over the request daemon thread
attending the request on the OVN-provider. This is generating a
deadlock when the GC is called over the driver class calling the
helper shutdown, and the request thread is attending any previous
request (through ovsdbapp using a lock over the txn on OVN DBs)
As the request thread is a daemon thread this operation looks safe.
Closes-Bug: #2093347
Change-Id: I464f6cebf3c65f300a3d0f10b661f77215475a7e
** Changed in: neutron
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2093347
Title:
[ovn-octavia-provider] first request is stuck on OVN txn
Status in neutron:
Fix Released
Bug description:
On a fresh env, first action over the ovn-provider is getting stuck
for 180s on the first txn over OVN NB DB.
After some depth analysis over the threads running there we saw that
the GC is called on the driver class, and then calling the shutdown on
the helper, doing a join() over the daemon thread responsible of
manage th e requests over the helper. At this way we have a dead_lock
because any further txn over OVN DB done by ovsdbapp is done using
lock and the join is also waiting for that lock getting the thread
hang for 180s (timeout ovsdbapp)
The inspect on the thread shows this behaviour during the stuck time:
Process 2249966: /usr/bin/uwsgi --ini /etc/octavia/octavia-uwsgi.ini --venv /opt/stack/data/venv
Python v3.12.3 (/usr/bin/uwsgi-core)
Thread 2062601 (active): "uWSGIWorker1Core0"
Thread 2250013 (idle): "Thread-2 (run)"
_wait_for_tstate_lock (threading.py:1167)
join (threading.py:1147)
shutdown (ovn_octavia_provider/helper.py:112)
__del__ (ovn_octavia_provider/driver.py:51)
__subclasscheck__ (<frozen abc>:123)
__subclasscheck__ (<frozen abc>:123)
__subclasscheck__ (<frozen abc>:123)
__subclasscheck__ (<frozen abc>:123)
__instancecheck__ (<frozen abc>:119)
db_replace_record (ovsdbapp/backend/ovs_idl/idlutils.py:452)
set_column (ovsdbapp/backend/ovs_idl/command.py:62)
set_columns (ovsdbapp/backend/ovs_idl/command.py:67)
run_idl (ovsdbapp/backend/ovs_idl/command.py:115)
do_commit (ovsdbapp/backend/ovs_idl/transaction.py:92)
run (ovsdbapp/backend/ovs_idl/connection.py:118)
run (threading.py:1010)
_bootstrap_inner (threading.py:1073)
_bootstrap (threading.py:1030)
Thread 2250332 (idle): "Thread-3 (request_handler)"
wait (threading.py:359)
get (queue.py:180)
commit (ovsdbapp/backend/ovs_idl/transaction.py:54)
__exit__ (ovsdbapp/api.py:71)
transaction (ovsdbapp/api.py:114)
__exit__ (contextlib.py:144)
transaction (impl_idl_ovn.py:180)
__exit__ (contextlib.py:144)
execute (ovsdbapp/backend/ovs_idl/command.py:49)
lb_create (ovn_octavia_provider/helper.py:1146)
request_handler (ovn_octavia_provider/helper.py:401)
run (threading.py:1010)
_bootstrap_inner (threading.py:1073)
_bootstrap (threading.py:1030)
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2093347/+subscriptions
References