← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1990174] Re: [OVN] Deadlock when starting neutron server, during the OVN hash ring deletion

 

Reviewed:  https://review.opendev.org/c/openstack/neutron/+/858542
Committed: https://opendev.org/openstack/neutron/commit/819a1bb3e6f3b10a1887e2ef836c138e02f8b996
Submitter: "Zuul (22348)"
Branch:    master

commit 819a1bb3e6f3b10a1887e2ef836c138e02f8b996
Author: Rodolfo Alonso Hernandez <ralonsoh@xxxxxxxxxx>
Date:   Tue Sep 20 13:32:04 2022 +0200

    Move the "ovn_hash_ring" clean up to maintenance worker
    
    The "ovn_hash_ring" procedure to clean up the stale/old registers
    is now executed on the ``HashRingHealthCheckPeriodics`` class, tha
    is executed on the ``MaintenanceWorker`` process.
    
    In a HA scenario, if several servers are rebooted at the same time,
    the "ovn_hash_ring" clean up operation can clash with API worker
    method "_load_hash_ring", that executed a SQL read from this table.
    In some high loaded environments, if the OVN database takes time
    to be locally cached, this read operation is executed thousand of
    times; basically any time an OVN database event occurs.
    
    In order to avoid/skip a deadlock when deleting the "ovn_hash_ring"
    table, this clean up is executed in a periodic task. If this task
    succeeds, the task is stopped. If the task raises a database
    exception, it is processed again.
    
    Now the "ovn_hash_ring" registers are retrieved using the
    "created_at" time as a filter. The initial time is taken when the
    OVN mechanism driver is initilized, before any API worker is spawned
    and any new "ovn_hash_ring" register has been created (an API
    worker, when started, will create a new "ovn_hash_ring" register).
    Any stale/old register stored in this table will be ignored; that
    means any register created before the OVN mechanism driver was
    started.
    
    Closes-Bug: #1990174
    
    Change-Id: I07c4cb6e20b8a84e4ace7a8e34555aced5b5da9f


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1990174

Title:
  [OVN] Deadlock when starting neutron server, during the OVN hash ring
  deletion

Status in neutron:
  Fix Released

Bug description:
  Related bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2125842

  Description of problem:
  Neutron server often fails to start and systemd needs to restart it. This is a problem at scale because all workers need to reconnect again to the OVN DBs.

  How reproducible:
  50%

  Steps to Reproduce:
  1. Start neutron server

  
  Error log: https://paste.opendev.org/show/bm3jZZ1oWX7ihK8JXzdE/

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1990174/+subscriptions



References