← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2056366] [NEW] Neutron ml2/ovn does not exit when killed with SIGTERM

 

Public bug reported:

When Neutron is killed with SIGTERM (like via systemctl), when using
ML2/OVN neutron workers do not exit and instead are eventually killed
with SIGKILL when the graceful timeout is reached (often around 1
minute).

This is happening due to the signal handlers for SIGTERM. There are
multiple issues.

1) oslo_service, ml2/ovn mech_driver, and ml2/ovo_rpc.py all call signal.signal(signal.SIGTERM, ...) overwriting each others signal handlers.
2) SIGTERM is handled in the main thread, and running blocking code there causes AssertionErrors in eventlet
3) The ml2/ovn cleanup code doesn't cause the process to end, so it interrupts the killing of the process

oslo_service has a singleton SignalHandler class that solves all of
these issues and we should use that instead of calling signal.signal()
ourselves.

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2056366

Title:
  Neutron ml2/ovn does not exit when killed with SIGTERM

Status in neutron:
  New

Bug description:
  When Neutron is killed with SIGTERM (like via systemctl), when using
  ML2/OVN neutron workers do not exit and instead are eventually killed
  with SIGKILL when the graceful timeout is reached (often around 1
  minute).

  This is happening due to the signal handlers for SIGTERM. There are
  multiple issues.

  1) oslo_service, ml2/ovn mech_driver, and ml2/ovo_rpc.py all call signal.signal(signal.SIGTERM, ...) overwriting each others signal handlers.
  2) SIGTERM is handled in the main thread, and running blocking code there causes AssertionErrors in eventlet
  3) The ml2/ovn cleanup code doesn't cause the process to end, so it interrupts the killing of the process

  oslo_service has a singleton SignalHandler class that solves all of
  these issues and we should use that instead of calling signal.signal()
  ourselves.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2056366/+subscriptions



Follow ups