← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1257524] [NEW] If neutron spawned dnsmasq dies, neutron-dhcp-agent will be totally unaware

 

Public bug reported:

I recently had some trouble with dnsmasq causing it to segfault in
certain situations. No doubt, this was a bug in dnsmasq. However, it was
quite troubling that Neutron never noted that dnsmasq had stopped
working. This is because dnsmasq is spawned as a daemon, even though it
is most definitely "owned" by neutron-dhcp-agent. Also if neutron-dhcp-
agent should die, since dnsmasq is a daemon it will continue to run and
be "stale", requiring manual intervention to clean up. However if it is
in the foreground then it will stay in neutron-dhcp-agent's process
group and should also die and if need-be cleaned up by init.

I did some analysis and will not be able to dig into the actual
implementation. However my analysis shows that this would work:

* use utils.create_process instead of execute and remember returned Popen object.
* spawn a greenthread to wait() on the process
* if it dies, restart it and log the error code
* pass the -k option so dnsmasq stays in foreground
* kill the process using child signals

Note sure how or if SIGCHLD plays a factor.

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1257524

Title:
  If neutron spawned dnsmasq dies, neutron-dhcp-agent will be totally
  unaware

Status in OpenStack Neutron (virtual network service):
  New

Bug description:
  I recently had some trouble with dnsmasq causing it to segfault in
  certain situations. No doubt, this was a bug in dnsmasq. However, it
  was quite troubling that Neutron never noted that dnsmasq had stopped
  working. This is because dnsmasq is spawned as a daemon, even though
  it is most definitely "owned" by neutron-dhcp-agent. Also if neutron-
  dhcp-agent should die, since dnsmasq is a daemon it will continue to
  run and be "stale", requiring manual intervention to clean up. However
  if it is in the foreground then it will stay in neutron-dhcp-agent's
  process group and should also die and if need-be cleaned up by init.

  I did some analysis and will not be able to dig into the actual
  implementation. However my analysis shows that this would work:

  * use utils.create_process instead of execute and remember returned Popen object.
  * spawn a greenthread to wait() on the process
  * if it dies, restart it and log the error code
  * pass the -k option so dnsmasq stays in foreground
  * kill the process using child signals

  Note sure how or if SIGCHLD plays a factor.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1257524/+subscriptions


Follow ups

References