yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #61695
[Bug 1223369] Re: Metadata ns proxy didn't start - pid already exist. Daemon already running?
[Expired for neutron because there has been no activity for 60 days.]
** Changed in: neutron
Status: Incomplete => Expired
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1223369
Title:
Metadata ns proxy didn't start - pid already exist. Daemon already
running?
Status in neutron:
Expired
Bug description:
This failure happened just once. Levels are Ubuntu Raring 13.04, Grizzly Quantum packages at 1:2013.1.2-0ubuntu1.
I noticed the metadata namespace proxy hadn't started after the network node was booted. The l3-agent.log (was only at INFO) has:
2013-09-04 15:53:16 INFO [quantum.openstack.common.rpc.common] Connected to AMQP server on 10.0.10.10:5672
2013-09-04 15:53:16 INFO [quantum.agent.l3_agent] L3 agent started
2013-09-04 15:53:28 ERROR [quantum.agent.l3_agent] Failed synchronizing routers
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/quantum/agent/l3_agent.py", line 638, in _sync_routers_task
self._process_routers(routers, all_routers=True)
File "/usr/lib/python2.7/dist-packages/quantum/agent/l3_agent.py", line 618, in _process_routers
self._router_added(r['id'], r)
File "/usr/lib/python2.7/dist-packages/quantum/agent/l3_agent.py", line 236, in _router_added
self._spawn_metadata_proxy(ri)
File "/usr/lib/python2.7/dist-packages/quantum/agent/l3_agent.py", line 270, in _spawn_metadata_proxy
pm.enable(callback)
File "/usr/lib/python2.7/dist-packages/quantum/agent/linux/external_process.py", line 55, in enable
ip_wrapper.netns.execute(cmd)
File "/usr/lib/python2.7/dist-packages/quantum/agent/linux/ip_lib.py", line 414, in execute
check_exit_code=check_exit_code)
File "/usr/lib/python2.7/dist-packages/quantum/agent/linux/utils.py", line 61, in execute
raise RuntimeError(m)
RuntimeError:
Command: ['sudo', 'quantum-rootwrap', '/etc/quantum/rootwrap.conf', 'ip', 'netns', 'exec', 'qrouter-fa2ec96d-d1f9-4af2-a022-cac171646aa7', 'quantum-ns-metadata-proxy', '--pid_file=/var/lib/quantum/external/pids/fa2ec96d-d1f9-4af2-a022-cac171646aa7.pid', '--router_id=fa2ec96d-d1f9-4af2-a022-cac171646aa7', '--state_path=/var/lib/quantum', '--metadata_port=9697', '--verbose', '--log-file=quantum-ns-metadata-proxyfa2ec96d-d1f9-4af2-a022-cac171646aa7.log', '--log-dir=/var/log/quantum']
Exit code: 1
Stdout: ''
Stderr: '2013-09-04 15:53:28 INFO [quantum.common.config] Logging enabled!\n2013-09-04 15:53:28 ERROR [quantum.agent.linux.daemon] Pidfile /var
/lib/quantum/external/pids/fa2ec96d-d1f9-4af2-a022-cac171646aa7.pid already exist. Daemon already running?\n'
And quantum-ns-metadata-proxyfa2ec96d-d1f9-4af2-a022-cac171646aa7.log has:
2013-08-29 19:04:04 INFO [quantum.common.config] Logging enabled!
2013-09-04 15:53:28 INFO [quantum.common.config] Logging enabled!
2013-09-04 15:53:28 ERROR [quantum.agent.linux.daemon] Pidfile /var/lib/quantum/external/pids/fa2ec96d-d1f9-4af2-a022-cac171646aa7.pid already exist. Daemon already running?
It is the same error message as https://bugs.launchpad.net/neutron/+bug/1177416 - but the patch from that bug was applied.
The file /lib/quantum/external/pids/fa2ec96d-
d1f9-4af2-a022-cac171646aa7.pid had 2045 in it - but no process with
pid 2045 was running when I checked - /proc/2045/ did not exist. The
pid file was stale as its date was that of the previous launch.
The process call chain in short-hand is like this:
l3-agent --> sudo rootwrap... --> python rootwrap ip netns exec qrouter-uuid quantum-ns-metadata-proxy router_id=uuid... --> python quantum-ns-metadata-proxy router_id=uuid...
Now the code in external_process.py either didn't find a
/proc/2045/cmdline, or if it did then that file did not have the
strings 'python' and 'fa2ec96d-d1f9-4af2-a022-cac171646aa7'. But the
code in daemon.py must have found a /proc/2045/cmdline and it must
have had those strings. The only explaination I can give for this is
that the python rootwrap process started by sudo just happened to get
pid 2045 that time, and this is what daemon.py is_running() found. Its
full command line would have looked like:
/usr/bin/python /usr/bin/quantum-rootwrap /etc/quantum/rootwrap.conf
ip netns exec qrouter-fa2ec96d-d1f9-4af2-a022-cac171646aa7 quantum-ns-
metadata-proxy --pid_file=/var/lib/quantum/external/pids/fa2ec96d-
d1f9-4af2-a022-cac171646aa7.pid --router_id=fa2ec96d-
d1f9-4af2-a022-cac171646aa7 --state_path=/var/lib/quantum
--metadata_port=9697 --verbose --log-file=quantum-ns-metadata-
proxyfa2ec96d-d1f9-4af2-a022-cac171646aa7.log --log-
dir=/var/log/quantum
It has the strings 'python' and the router's uuid, so it would have
matched. If my theory is right, then a possible fix would be to change
the checks to not report cmdlines with 'ip\x00netns\x00exec' as a
running daemon.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1223369/+subscriptions