← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1177416] Re: Metadata proxy startup can fail when Daemon class doesn't properly match running processes

 

** Changed in: quantum
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to quantum.
https://bugs.launchpad.net/bugs/1177416

Title:
  Metadata proxy startup can fail when Daemon class doesn't properly
  match running processes

Status in OpenStack Quantum (virtual network service):
  Fix Released

Bug description:
  I came across a case where two metadata namespace proxy pid files had
  the same id in them, possibly due to a reboot of the network
  controller:

  root@qu-network-controller:# for f in `ls -1 /var/lib/quantum/external/pids/*.pid`; do cat $f | grep 31857 && echo $f; done
  31857
  /var/lib/quantum/external/pids/91e99f72-6fb0-49e5-9fbc-0c11d013d66e.pid
  31857
  /var/lib/quantum/external/pids/dc8af719-e6a0-4cc7-92d0-b2bf309e4245.pid

  The pid in question was for a proxy for the namespace ending in d66e:

  root@qu-network-controller:# cat /proc/31857/cmdline 
  python/usr/bin/quantum-ns-metadata-proxy--pid_file=/var/lib/quantum/external/pids/91e99f72-6fb0-49e5-9fbc-0c11d013d66e.pid--network_id=91e99f72-6fb0-49e5-9fbc-0c11d013d66e--state_path=/var/lib/quantum--metadata_port=80--debug--verbose--log-file=quantum-ns-metadata-proxy91e99f72-6fb0-49e5-9fbc-0c11d013d66e.log--log-dir=

  Unfortunately, when quantum went to spawn the dhcp agent for the 4245
  namespace, which will also spawn a namespace proxy, the code in Daemon
  class incorrectly matched this existing proxy and threw an exception.

  From dhcp-agent.log:

  2013-04-25 14:00:04 ERROR [quantum.agent.dhcp_agent] Unable to sync network state.
  Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/quantum/agent/dhcp_agent.py", line 155, in sync_state
  self.refresh_dhcp_helper(network_id)
  File "/usr/lib/python2.7/dist-packages/quantum/agent/dhcp_agent.py", line 209, in refresh_dhcp_helper
  return self.enable_dhcp_helper(network_id)
  File "/usr/lib/python2.7/dist-packages/quantum/agent/dhcp_agent.py", line 188, in enable_dhcp_helper
  self.enable_isolated_metadata_proxy(network)
  File "/usr/lib/python2.7/dist-packages/quantum/agent/dhcp_agent.py", line 329, in enable_isolated_metadata_proxy
  pm.enable(callback)
  File "/usr/lib/python2.7/dist-packages/quantum/agent/linux/external_process.py", line 55, in enable
  ip_wrapper.netns.execute(cmd)
  File "/usr/lib/python2.7/dist-packages/quantum/agent/linux/ip_lib.py", line 407, in execute
  check_exit_code=check_exit_code)
  File "/usr/lib/python2.7/dist-packages/quantum/agent/linux/utils.py", line 61, in execute
  raise RuntimeError(m)
  RuntimeError:
  Command: ['sudo', 'quantum-rootwrap', '/etc/quantum/rootwrap.conf', 'ip', 'netns', 'exec', 'qdhcp-dc8af719-e6a0-4cc7-92d0-b2bf309e4245', 'quantum-ns-metadata-proxy', '--pid_file=/var/lib/quantum/external/pids/dc8af719-e6a0-4cc7-92d0-b2bf309e4245.pid', '--network_id=dc8af719-e6a0-4cc7-92d0-b2bf309e4245', '--state_path=/var/lib/quantum', '--metadata_port=80', '--debug', '--verbose', '--log-file=quantum-ns-metadata-proxydc8af719-e6a0-4cc7-92d0-b2bf309e4245.log', '--log-dir=/var/log/quantum']

  Looking further in dhcp-agent.log found the actual problem:

  2013-04-25 14:00:04 DEBUG [quantum.agent.linux.utils] Running command:
  ['sudo', 'cat', '/proc/31857/cmdline']\n2013-04-25 14:00:04 DEBUG
  [quantum.agent.linux.utils] \nCommand: ['sudo', 'cat',
  '/proc/31857/cmdline']\nExit code: 0\nStdout: 'python\\x00/usr/bin
  /quantum-ns-metadata-
  proxy\\x00-pid_file=/var/lib/quantum/external/pids/91e99f72-6fb0-49e5
  -9fbc-0c11d013d66e.pid\\x00network_id=91e99f72-6fb0-49e5-9fbc-
  0c11d013d66e\\x00state_path=/var/lib/quantum\\x00metadata_port=80\\x00debug\\x00verbose
  \\x00log-file=quantum-ns-metadata-proxy91e99f72-6fb0-49e5-9fbc-
  0c11d013d66e.log\\x00-log-dir=/var/log/quantum\\x00'\nStderr:
  ''\n2013-04-25 14:00:04 ERROR [quantum.agent.linux.daemon] Pidfile
  /var/lib/quantum/external/pids/dc8af719-e6a0-4cc7-92d0-b2bf309e4245.pid
  already exist. Daemon already running?\n"

  That's the right pid file, but cmdline for the other proxy process.

  The Daemon class needs to use the uuid when looking at cmdline so that
  it doesn't match the wrong process.  I have a patch that passes an
  additional argument at init time to do this and fixes the problem,
  I'll assign this to myself and send the change out.

To manage notifications about this bug go to:
https://bugs.launchpad.net/quantum/+bug/1177416/+subscriptions