← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2056366] Re: Neutron ml2/ovn does not exit when killed with SIGTERM

 

Reviewed:  https://review.opendev.org/c/openstack/neutron/+/911625
Committed: https://opendev.org/openstack/neutron/commit/a4e49b6b8fcf9acfa4e84c65de19ffd56b9022e7
Submitter: "Zuul (22348)"
Branch:    master

commit a4e49b6b8fcf9acfa4e84c65de19ffd56b9022e7
Author: Terry Wilson <twilson@xxxxxxxxxx>
Date:   Wed Mar 6 20:13:58 2024 +0000

    Use oslo_service's SignalHandler for signals
    
    When Neutron is killed with SIGTERM (like via systemctl), when using
    ML2/OVN neutron workers do not exit and instead are eventually killed
    with SIGKILL when the graceful timeout is reached (often around 1
    minute).
    
    This is happening due to the signal handlers for SIGTERM. There are
    multiple issues.
    
    1) oslo_service, ml2/ovn mech_driver, and ml2/ovo_rpc.py all call
       signal.signal(signal.SIGTERM, ...) overwriting each others signal
       handlers.
    2) SIGTERM is handled in the main thread, and running blocking code
       there causes AssertionErrors in eventlet which also prevents the
       process from exiting.
    3) The ml2/ovn cleanup code doesn't cause the process to end, so it
       interrupts the killing of the process.
    
    oslo_service has a singleton SignalHandler class that solves all of
    these issues
    
    Closes-Bug: #2056366
    Depends-On: https://review.opendev.org/c/openstack/oslo.service/+/911627
    Change-Id: I730a12746bceaa744c658854e38439420efc4629
    Signed-off-by: Terry Wilson <twilson@xxxxxxxxxx>


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2056366

Title:
  Neutron ml2/ovn does not exit when killed with SIGTERM

Status in neutron:
  Fix Released

Bug description:
  When Neutron is killed with SIGTERM (like via systemctl), when using
  ML2/OVN neutron workers do not exit and instead are eventually killed
  with SIGKILL when the graceful timeout is reached (often around 1
  minute).

  This is happening due to the signal handlers for SIGTERM. There are
  multiple issues.

  1) oslo_service, ml2/ovn mech_driver, and ml2/ovo_rpc.py all call signal.signal(signal.SIGTERM, ...) overwriting each others signal handlers.
  2) SIGTERM is handled in the main thread, and running blocking code there causes AssertionErrors in eventlet
  3) The ml2/ovn cleanup code doesn't cause the process to end, so it interrupts the killing of the process

  oslo_service has a singleton SignalHandler class that solves all of
  these issues and we should use that instead of calling signal.signal()
  ourselves.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2056366/+subscriptions



References