yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #06239
[Bug 1257524] [NEW] If neutron spawned dnsmasq dies, neutron-dhcp-agent will be totally unaware
Public bug reported:
I recently had some trouble with dnsmasq causing it to segfault in
certain situations. No doubt, this was a bug in dnsmasq. However, it was
quite troubling that Neutron never noted that dnsmasq had stopped
working. This is because dnsmasq is spawned as a daemon, even though it
is most definitely "owned" by neutron-dhcp-agent. Also if neutron-dhcp-
agent should die, since dnsmasq is a daemon it will continue to run and
be "stale", requiring manual intervention to clean up. However if it is
in the foreground then it will stay in neutron-dhcp-agent's process
group and should also die and if need-be cleaned up by init.
I did some analysis and will not be able to dig into the actual
implementation. However my analysis shows that this would work:
* use utils.create_process instead of execute and remember returned Popen object.
* spawn a greenthread to wait() on the process
* if it dies, restart it and log the error code
* pass the -k option so dnsmasq stays in foreground
* kill the process using child signals
Note sure how or if SIGCHLD plays a factor.
** Affects: neutron
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1257524
Title:
If neutron spawned dnsmasq dies, neutron-dhcp-agent will be totally
unaware
Status in OpenStack Neutron (virtual network service):
New
Bug description:
I recently had some trouble with dnsmasq causing it to segfault in
certain situations. No doubt, this was a bug in dnsmasq. However, it
was quite troubling that Neutron never noted that dnsmasq had stopped
working. This is because dnsmasq is spawned as a daemon, even though
it is most definitely "owned" by neutron-dhcp-agent. Also if neutron-
dhcp-agent should die, since dnsmasq is a daemon it will continue to
run and be "stale", requiring manual intervention to clean up. However
if it is in the foreground then it will stay in neutron-dhcp-agent's
process group and should also die and if need-be cleaned up by init.
I did some analysis and will not be able to dig into the actual
implementation. However my analysis shows that this would work:
* use utils.create_process instead of execute and remember returned Popen object.
* spawn a greenthread to wait() on the process
* if it dies, restart it and log the error code
* pass the -k option so dnsmasq stays in foreground
* kill the process using child signals
Note sure how or if SIGCHLD plays a factor.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1257524/+subscriptions
Follow ups
References