← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2093347] [NEW] [ovn-octavia-provider] first request is stuck on OVN txn

 

Public bug reported:

On a fresh env, first action over the ovn-provider is getting stuck for
180s on the first txn over OVN NB DB.

After some depth analysis over the threads running there we saw that the
GC is called on the driver class, and then calling the shutdown on the
helper, doing a join() over the daemon thread responsible of manage th e
requests over the helper. At this way we have a dead_lock because any
further txn over OVN DB done by ovsdbapp is done using lock and the join
is also waiting for that lock getting the thread hang for 180s (timeout
ovsdbapp)

The inspect on the thread shows this behaviour during the stuck time:

Process 2249966: /usr/bin/uwsgi --ini /etc/octavia/octavia-uwsgi.ini --venv /opt/stack/data/venv
Python v3.12.3 (/usr/bin/uwsgi-core)

Thread 2062601 (active): "uWSGIWorker1Core0"
Thread 2250013 (idle): "Thread-2 (run)"
    _wait_for_tstate_lock (threading.py:1167)
    join (threading.py:1147)
    shutdown (ovn_octavia_provider/helper.py:112)
    __del__ (ovn_octavia_provider/driver.py:51)
    __subclasscheck__ (<frozen abc>:123)
    __subclasscheck__ (<frozen abc>:123)
    __subclasscheck__ (<frozen abc>:123)
    __subclasscheck__ (<frozen abc>:123)
    __instancecheck__ (<frozen abc>:119)
    db_replace_record (ovsdbapp/backend/ovs_idl/idlutils.py:452)
    set_column (ovsdbapp/backend/ovs_idl/command.py:62)
    set_columns (ovsdbapp/backend/ovs_idl/command.py:67)
    run_idl (ovsdbapp/backend/ovs_idl/command.py:115)
    do_commit (ovsdbapp/backend/ovs_idl/transaction.py:92)
    run (ovsdbapp/backend/ovs_idl/connection.py:118)
    run (threading.py:1010)
    _bootstrap_inner (threading.py:1073)
    _bootstrap (threading.py:1030)
Thread 2250332 (idle): "Thread-3 (request_handler)"
    wait (threading.py:359)
    get (queue.py:180)
    commit (ovsdbapp/backend/ovs_idl/transaction.py:54)
    __exit__ (ovsdbapp/api.py:71)
    transaction (ovsdbapp/api.py:114)
    __exit__ (contextlib.py:144)
    transaction (impl_idl_ovn.py:180)
    __exit__ (contextlib.py:144)
    execute (ovsdbapp/backend/ovs_idl/command.py:49)
    lb_create (ovn_octavia_provider/helper.py:1146)
    request_handler (ovn_octavia_provider/helper.py:401)
    run (threading.py:1010)
    _bootstrap_inner (threading.py:1073)
    _bootstrap (threading.py:1030)

** Affects: neutron
     Importance: Undecided
     Assignee: Fernando Royo (froyoredhat)
         Status: In Progress


** Tags: ovn-octavia-provider

** Changed in: neutron
     Assignee: (unassigned) => Fernando Royo (froyoredhat)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2093347

Title:
  [ovn-octavia-provider] first request is stuck on OVN txn

Status in neutron:
  In Progress

Bug description:
  On a fresh env, first action over the ovn-provider is getting stuck
  for 180s on the first txn over OVN NB DB.

  After some depth analysis over the threads running there we saw that
  the GC is called on the driver class, and then calling the shutdown on
  the helper, doing a join() over the daemon thread responsible of
  manage th e requests over the helper. At this way we have a dead_lock
  because any further txn over OVN DB done by ovsdbapp is done using
  lock and the join is also waiting for that lock getting the thread
  hang for 180s (timeout ovsdbapp)

  The inspect on the thread shows this behaviour during the stuck time:

  Process 2249966: /usr/bin/uwsgi --ini /etc/octavia/octavia-uwsgi.ini --venv /opt/stack/data/venv
  Python v3.12.3 (/usr/bin/uwsgi-core)

  Thread 2062601 (active): "uWSGIWorker1Core0"
  Thread 2250013 (idle): "Thread-2 (run)"
      _wait_for_tstate_lock (threading.py:1167)
      join (threading.py:1147)
      shutdown (ovn_octavia_provider/helper.py:112)
      __del__ (ovn_octavia_provider/driver.py:51)
      __subclasscheck__ (<frozen abc>:123)
      __subclasscheck__ (<frozen abc>:123)
      __subclasscheck__ (<frozen abc>:123)
      __subclasscheck__ (<frozen abc>:123)
      __instancecheck__ (<frozen abc>:119)
      db_replace_record (ovsdbapp/backend/ovs_idl/idlutils.py:452)
      set_column (ovsdbapp/backend/ovs_idl/command.py:62)
      set_columns (ovsdbapp/backend/ovs_idl/command.py:67)
      run_idl (ovsdbapp/backend/ovs_idl/command.py:115)
      do_commit (ovsdbapp/backend/ovs_idl/transaction.py:92)
      run (ovsdbapp/backend/ovs_idl/connection.py:118)
      run (threading.py:1010)
      _bootstrap_inner (threading.py:1073)
      _bootstrap (threading.py:1030)
  Thread 2250332 (idle): "Thread-3 (request_handler)"
      wait (threading.py:359)
      get (queue.py:180)
      commit (ovsdbapp/backend/ovs_idl/transaction.py:54)
      __exit__ (ovsdbapp/api.py:71)
      transaction (ovsdbapp/api.py:114)
      __exit__ (contextlib.py:144)
      transaction (impl_idl_ovn.py:180)
      __exit__ (contextlib.py:144)
      execute (ovsdbapp/backend/ovs_idl/command.py:49)
      lb_create (ovn_octavia_provider/helper.py:1146)
      request_handler (ovn_octavia_provider/helper.py:401)
      run (threading.py:1010)
      _bootstrap_inner (threading.py:1073)
      _bootstrap (threading.py:1030)

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2093347/+subscriptions