yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #93748
[Bug 2056366] Re: Neutron ml2/ovn does not exit when killed with SIGTERM
Reviewed: https://review.opendev.org/c/openstack/neutron/+/911625
Committed: https://opendev.org/openstack/neutron/commit/a4e49b6b8fcf9acfa4e84c65de19ffd56b9022e7
Submitter: "Zuul (22348)"
Branch: master
commit a4e49b6b8fcf9acfa4e84c65de19ffd56b9022e7
Author: Terry Wilson <twilson@xxxxxxxxxx>
Date: Wed Mar 6 20:13:58 2024 +0000
Use oslo_service's SignalHandler for signals
When Neutron is killed with SIGTERM (like via systemctl), when using
ML2/OVN neutron workers do not exit and instead are eventually killed
with SIGKILL when the graceful timeout is reached (often around 1
minute).
This is happening due to the signal handlers for SIGTERM. There are
multiple issues.
1) oslo_service, ml2/ovn mech_driver, and ml2/ovo_rpc.py all call
signal.signal(signal.SIGTERM, ...) overwriting each others signal
handlers.
2) SIGTERM is handled in the main thread, and running blocking code
there causes AssertionErrors in eventlet which also prevents the
process from exiting.
3) The ml2/ovn cleanup code doesn't cause the process to end, so it
interrupts the killing of the process.
oslo_service has a singleton SignalHandler class that solves all of
these issues
Closes-Bug: #2056366
Depends-On: https://review.opendev.org/c/openstack/oslo.service/+/911627
Change-Id: I730a12746bceaa744c658854e38439420efc4629
Signed-off-by: Terry Wilson <twilson@xxxxxxxxxx>
** Changed in: neutron
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2056366
Title:
Neutron ml2/ovn does not exit when killed with SIGTERM
Status in neutron:
Fix Released
Bug description:
When Neutron is killed with SIGTERM (like via systemctl), when using
ML2/OVN neutron workers do not exit and instead are eventually killed
with SIGKILL when the graceful timeout is reached (often around 1
minute).
This is happening due to the signal handlers for SIGTERM. There are
multiple issues.
1) oslo_service, ml2/ovn mech_driver, and ml2/ovo_rpc.py all call signal.signal(signal.SIGTERM, ...) overwriting each others signal handlers.
2) SIGTERM is handled in the main thread, and running blocking code there causes AssertionErrors in eventlet
3) The ml2/ovn cleanup code doesn't cause the process to end, so it interrupts the killing of the process
oslo_service has a singleton SignalHandler class that solves all of
these issues and we should use that instead of calling signal.signal()
ourselves.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2056366/+subscriptions
References