← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1223369] Re: Metadata ns proxy didn't start - pid already exist. Daemon already running?

 

[Expired for neutron because there has been no activity for 60 days.]

** Changed in: neutron
       Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1223369

Title:
  Metadata ns proxy didn't start - pid already exist. Daemon already
  running?

Status in neutron:
  Expired

Bug description:
  This failure happened just once. Levels are Ubuntu Raring 13.04, Grizzly Quantum packages at 1:2013.1.2-0ubuntu1.
  I noticed the metadata namespace proxy hadn't started after the network node was booted. The l3-agent.log (was only at INFO) has:

  2013-09-04 15:53:16     INFO [quantum.openstack.common.rpc.common] Connected to AMQP server on 10.0.10.10:5672
  2013-09-04 15:53:16     INFO [quantum.agent.l3_agent] L3 agent started
  2013-09-04 15:53:28    ERROR [quantum.agent.l3_agent] Failed synchronizing routers
  Traceback (most recent call last):
    File "/usr/lib/python2.7/dist-packages/quantum/agent/l3_agent.py", line 638, in _sync_routers_task
      self._process_routers(routers, all_routers=True)
    File "/usr/lib/python2.7/dist-packages/quantum/agent/l3_agent.py", line 618, in _process_routers
      self._router_added(r['id'], r)
    File "/usr/lib/python2.7/dist-packages/quantum/agent/l3_agent.py", line 236, in _router_added
      self._spawn_metadata_proxy(ri)
    File "/usr/lib/python2.7/dist-packages/quantum/agent/l3_agent.py", line 270, in _spawn_metadata_proxy
      pm.enable(callback)
    File "/usr/lib/python2.7/dist-packages/quantum/agent/linux/external_process.py", line 55, in enable
      ip_wrapper.netns.execute(cmd)
    File "/usr/lib/python2.7/dist-packages/quantum/agent/linux/ip_lib.py", line 414, in execute
      check_exit_code=check_exit_code)
    File "/usr/lib/python2.7/dist-packages/quantum/agent/linux/utils.py", line 61, in execute
      raise RuntimeError(m)
  RuntimeError: 
  Command: ['sudo', 'quantum-rootwrap', '/etc/quantum/rootwrap.conf', 'ip', 'netns', 'exec', 'qrouter-fa2ec96d-d1f9-4af2-a022-cac171646aa7', 'quantum-ns-metadata-proxy', '--pid_file=/var/lib/quantum/external/pids/fa2ec96d-d1f9-4af2-a022-cac171646aa7.pid', '--router_id=fa2ec96d-d1f9-4af2-a022-cac171646aa7', '--state_path=/var/lib/quantum', '--metadata_port=9697', '--verbose', '--log-file=quantum-ns-metadata-proxyfa2ec96d-d1f9-4af2-a022-cac171646aa7.log', '--log-dir=/var/log/quantum']
  Exit code: 1
  Stdout: ''
  Stderr: '2013-09-04 15:53:28     INFO [quantum.common.config] Logging enabled!\n2013-09-04 15:53:28    ERROR [quantum.agent.linux.daemon] Pidfile /var
  /lib/quantum/external/pids/fa2ec96d-d1f9-4af2-a022-cac171646aa7.pid already exist. Daemon already running?\n'

  
  And quantum-ns-metadata-proxyfa2ec96d-d1f9-4af2-a022-cac171646aa7.log has:

  2013-08-29 19:04:04     INFO [quantum.common.config] Logging enabled!
  2013-09-04 15:53:28     INFO [quantum.common.config] Logging enabled!
  2013-09-04 15:53:28    ERROR [quantum.agent.linux.daemon] Pidfile /var/lib/quantum/external/pids/fa2ec96d-d1f9-4af2-a022-cac171646aa7.pid already exist. Daemon already running?

  
  It is the same error message as https://bugs.launchpad.net/neutron/+bug/1177416 - but the patch from that bug was applied.

  The file /lib/quantum/external/pids/fa2ec96d-
  d1f9-4af2-a022-cac171646aa7.pid had 2045 in it - but no process with
  pid 2045 was running when I checked - /proc/2045/ did not exist. The
  pid file was stale as its date was that of the previous launch.

  The process call chain in short-hand is like this:
  l3-agent --> sudo rootwrap... --> python rootwrap ip netns exec qrouter-uuid quantum-ns-metadata-proxy router_id=uuid... --> python quantum-ns-metadata-proxy router_id=uuid... 

  Now the code in external_process.py either didn't find a
  /proc/2045/cmdline, or if it did then that file did not have the
  strings 'python' and 'fa2ec96d-d1f9-4af2-a022-cac171646aa7'. But the
  code in daemon.py must have found a /proc/2045/cmdline and it must
  have had those strings. The only explaination I can give for this is
  that the python rootwrap process started by sudo just happened to get
  pid 2045 that time, and this is what daemon.py is_running() found. Its
  full command line would have looked like:

  /usr/bin/python /usr/bin/quantum-rootwrap /etc/quantum/rootwrap.conf
  ip netns exec qrouter-fa2ec96d-d1f9-4af2-a022-cac171646aa7 quantum-ns-
  metadata-proxy --pid_file=/var/lib/quantum/external/pids/fa2ec96d-
  d1f9-4af2-a022-cac171646aa7.pid --router_id=fa2ec96d-
  d1f9-4af2-a022-cac171646aa7 --state_path=/var/lib/quantum
  --metadata_port=9697 --verbose --log-file=quantum-ns-metadata-
  proxyfa2ec96d-d1f9-4af2-a022-cac171646aa7.log --log-
  dir=/var/log/quantum

  It has the strings 'python' and the router's uuid, so it would have
  matched. If my theory is right, then a possible fix would be to change
  the checks to not report cmdlines with 'ip\x00netns\x00exec' as a
  running daemon.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1223369/+subscriptions