yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #03133
[Bug 1177416] Re: Metadata proxy startup can fail when Daemon class doesn't properly match running processes
** Changed in: quantum/grizzly
Status: Fix Committed => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to quantum.
https://bugs.launchpad.net/bugs/1177416
Title:
Metadata proxy startup can fail when Daemon class doesn't properly
match running processes
Status in OpenStack Quantum (virtual network service):
Fix Released
Status in quantum grizzly series:
Fix Released
Bug description:
I came across a case where two metadata namespace proxy pid files had
the same id in them, possibly due to a reboot of the network
controller:
root@qu-network-controller:# for f in `ls -1 /var/lib/quantum/external/pids/*.pid`; do cat $f | grep 31857 && echo $f; done
31857
/var/lib/quantum/external/pids/91e99f72-6fb0-49e5-9fbc-0c11d013d66e.pid
31857
/var/lib/quantum/external/pids/dc8af719-e6a0-4cc7-92d0-b2bf309e4245.pid
The pid in question was for a proxy for the namespace ending in d66e:
root@qu-network-controller:# cat /proc/31857/cmdline
python/usr/bin/quantum-ns-metadata-proxy--pid_file=/var/lib/quantum/external/pids/91e99f72-6fb0-49e5-9fbc-0c11d013d66e.pid--network_id=91e99f72-6fb0-49e5-9fbc-0c11d013d66e--state_path=/var/lib/quantum--metadata_port=80--debug--verbose--log-file=quantum-ns-metadata-proxy91e99f72-6fb0-49e5-9fbc-0c11d013d66e.log--log-dir=
Unfortunately, when quantum went to spawn the dhcp agent for the 4245
namespace, which will also spawn a namespace proxy, the code in Daemon
class incorrectly matched this existing proxy and threw an exception.
From dhcp-agent.log:
2013-04-25 14:00:04 ERROR [quantum.agent.dhcp_agent] Unable to sync network state.
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/quantum/agent/dhcp_agent.py", line 155, in sync_state
self.refresh_dhcp_helper(network_id)
File "/usr/lib/python2.7/dist-packages/quantum/agent/dhcp_agent.py", line 209, in refresh_dhcp_helper
return self.enable_dhcp_helper(network_id)
File "/usr/lib/python2.7/dist-packages/quantum/agent/dhcp_agent.py", line 188, in enable_dhcp_helper
self.enable_isolated_metadata_proxy(network)
File "/usr/lib/python2.7/dist-packages/quantum/agent/dhcp_agent.py", line 329, in enable_isolated_metadata_proxy
pm.enable(callback)
File "/usr/lib/python2.7/dist-packages/quantum/agent/linux/external_process.py", line 55, in enable
ip_wrapper.netns.execute(cmd)
File "/usr/lib/python2.7/dist-packages/quantum/agent/linux/ip_lib.py", line 407, in execute
check_exit_code=check_exit_code)
File "/usr/lib/python2.7/dist-packages/quantum/agent/linux/utils.py", line 61, in execute
raise RuntimeError(m)
RuntimeError:
Command: ['sudo', 'quantum-rootwrap', '/etc/quantum/rootwrap.conf', 'ip', 'netns', 'exec', 'qdhcp-dc8af719-e6a0-4cc7-92d0-b2bf309e4245', 'quantum-ns-metadata-proxy', '--pid_file=/var/lib/quantum/external/pids/dc8af719-e6a0-4cc7-92d0-b2bf309e4245.pid', '--network_id=dc8af719-e6a0-4cc7-92d0-b2bf309e4245', '--state_path=/var/lib/quantum', '--metadata_port=80', '--debug', '--verbose', '--log-file=quantum-ns-metadata-proxydc8af719-e6a0-4cc7-92d0-b2bf309e4245.log', '--log-dir=/var/log/quantum']
Looking further in dhcp-agent.log found the actual problem:
2013-04-25 14:00:04 DEBUG [quantum.agent.linux.utils] Running command:
['sudo', 'cat', '/proc/31857/cmdline']\n2013-04-25 14:00:04 DEBUG
[quantum.agent.linux.utils] \nCommand: ['sudo', 'cat',
'/proc/31857/cmdline']\nExit code: 0\nStdout: 'python\\x00/usr/bin
/quantum-ns-metadata-
proxy\\x00-pid_file=/var/lib/quantum/external/pids/91e99f72-6fb0-49e5
-9fbc-0c11d013d66e.pid\\x00network_id=91e99f72-6fb0-49e5-9fbc-
0c11d013d66e\\x00state_path=/var/lib/quantum\\x00metadata_port=80\\x00debug\\x00verbose
\\x00log-file=quantum-ns-metadata-proxy91e99f72-6fb0-49e5-9fbc-
0c11d013d66e.log\\x00-log-dir=/var/log/quantum\\x00'\nStderr:
''\n2013-04-25 14:00:04 ERROR [quantum.agent.linux.daemon] Pidfile
/var/lib/quantum/external/pids/dc8af719-e6a0-4cc7-92d0-b2bf309e4245.pid
already exist. Daemon already running?\n"
That's the right pid file, but cmdline for the other proxy process.
The Daemon class needs to use the uuid when looking at cmdline so that
it doesn't match the wrong process. I have a patch that passes an
additional argument at init time to do this and fixes the problem,
I'll assign this to myself and send the change out.
To manage notifications about this bug go to:
https://bugs.launchpad.net/quantum/+bug/1177416/+subscriptions