← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1339045] [NEW] Cannot open network namespace

 

Public bug reported:

While booting tripleo undercloud have occassionally (about 8 in 10
attempts) seen errors in neutron-dhcp-agent.log of:

2014-07-08 10:12:07.374 3905 ERROR neutron.agent.linux.utils [-] 
Command: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'qdhcp-e967d587-5999-4333-8405-76b32398b121', 'ip', 'link', 'set', 'tap4ac4d6ab-d7', 'up']
Exit code: 1
Stdout: ''
Stderr: 'Cannot open network namespace "qdhcp-e967d587-5999-4333-8405-76b32398b121": No such file or directory\n'
2014-07-08 10:12:07.419 3905 ERROR neutron.agent.linux.utils [-] 
Command: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'qdhcp-e967d587-5999-4333-8405-76b32398b121', 'ip', '-o', 'link', 'show', 'tap4ac4d6ab-d7']
Exit code: 1
Stdout: ''
Stderr: 'Cannot open network namespace "qdhcp-e967d587-5999-4333-8405-76b32398b121": No such file or directory\n'


In all cases these ERRORs are preceeded by the ERROR:

2014-07-08 10:12:07.151 3905 ERROR neutron.agent.dhcp_agent [req-523b2309-18c9-4a2d-ad65-7586db10f956 None] Failed reporting state!
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent Traceback (most recent call last):
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent   File "/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/neutron/agent/dhcp_agent.py", line 576, in _report_state
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent     self.state_rpc.report_state(ctx, self.agent_state, self.use_call)
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent   File "/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/neutron/agent/rpc.py", line 70, in report_state
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent     return self.call(context, msg, topic=self.topic)
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent   File "/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/neutron/common/rpc.py", line 161, in call
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent     context, msg, rpc_method='call', **kwargs)
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent   File "/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/neutron/common/rpc.py", line 185, in __call_rpc_method
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent     return func(context, msg['method'], **msg['args'])
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent   File "/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/oslo/messaging/rpc/client.py", line 150, in call
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent     wait_for_reply=True, timeout=timeout)
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent   File "/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/oslo/messaging/transport.py", line 90, in _send
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent     timeout=timeout)
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent   File "/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 412, in send
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent     return self._send(target, ctxt, message, wait_for_reply, timeout)
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent   File "/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 403, in _send
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent     result = self._waiter.wait(msg_id, timeout)
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent   File "/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 267, in wait
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent     reply, ending = self._poll_connection(msg_id, timeout)
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent   File "/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 217, in _poll_connection
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent     % msg_id)
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent MessagingTimeout: Timed out waiting for a reply to message ID 21fd392ebb3a43228447f380334d05b0
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent 
2014-07-08 10:12:07.157 3905 WARNING neutron.openstack.common.loopingcall [-] task run outlasted interval by 30.119771 sec

If I then wait for the undercloud to fail and login to the seed host, I can manually run the command that failed previously:
root@ubuntu:/var/log/upstart# sudo /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf ip netns exec 'qdhcp-e967d587-5999-4333-8405-76b32398b121' ip link set tap4ac4d6ab-d7 up

Replicate the ERROR message by attempting to exec a command for a non existent namespace..
------------------------------------------------------------------------------------------------------------------------------------
root@ubuntu:/var/log/upstart# sudo /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf ip netns exec 'qdhcp-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' ip link set tap4ac4d6ab-d7 up
Cannot open network namespace "qdhcp-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx": No such file or directory


The code section that returns the error (neutron/agent/linux/ip_lib.py - see below) has a comment regarding the return of a RuntimeError if the device does not exist - so this could be seen as expected behaviour
def ensure_device_is_ready(device_name, root_helper=None, namespace=None):
    dev = IPDevice(device_name, root_helper, namespace)
    try:
        # Ensure the device is up, even if it is already up. If the device
        # doesn't exist, a RuntimeError will be raised.
        dev.link.set_up()
    except RuntimeError:
        return False
    return True


 - why does the device not exist? (probably a timing issue)

 - why not wait until the device exists before attempting the operation?
    what check could be put in place - prior to calling this function (or as part of this function) that could check on the state of the device
    would a retry be appropriate?

 - Is the name of the function correct? - does not seem to
'ensure_device_is_ready' - more like "set_device_interface_up"?

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1339045

Title:
  Cannot open network namespace

Status in OpenStack Neutron (virtual network service):
  New

Bug description:
  While booting tripleo undercloud have occassionally (about 8 in 10
  attempts) seen errors in neutron-dhcp-agent.log of:

  2014-07-08 10:12:07.374 3905 ERROR neutron.agent.linux.utils [-] 
  Command: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'qdhcp-e967d587-5999-4333-8405-76b32398b121', 'ip', 'link', 'set', 'tap4ac4d6ab-d7', 'up']
  Exit code: 1
  Stdout: ''
  Stderr: 'Cannot open network namespace "qdhcp-e967d587-5999-4333-8405-76b32398b121": No such file or directory\n'
  2014-07-08 10:12:07.419 3905 ERROR neutron.agent.linux.utils [-] 
  Command: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'qdhcp-e967d587-5999-4333-8405-76b32398b121', 'ip', '-o', 'link', 'show', 'tap4ac4d6ab-d7']
  Exit code: 1
  Stdout: ''
  Stderr: 'Cannot open network namespace "qdhcp-e967d587-5999-4333-8405-76b32398b121": No such file or directory\n'

  
  In all cases these ERRORs are preceeded by the ERROR:

  2014-07-08 10:12:07.151 3905 ERROR neutron.agent.dhcp_agent [req-523b2309-18c9-4a2d-ad65-7586db10f956 None] Failed reporting state!
  2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent Traceback (most recent call last):
  2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent   File "/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/neutron/agent/dhcp_agent.py", line 576, in _report_state
  2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent     self.state_rpc.report_state(ctx, self.agent_state, self.use_call)
  2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent   File "/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/neutron/agent/rpc.py", line 70, in report_state
  2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent     return self.call(context, msg, topic=self.topic)
  2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent   File "/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/neutron/common/rpc.py", line 161, in call
  2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent     context, msg, rpc_method='call', **kwargs)
  2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent   File "/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/neutron/common/rpc.py", line 185, in __call_rpc_method
  2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent     return func(context, msg['method'], **msg['args'])
  2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent   File "/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/oslo/messaging/rpc/client.py", line 150, in call
  2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent     wait_for_reply=True, timeout=timeout)
  2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent   File "/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/oslo/messaging/transport.py", line 90, in _send
  2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent     timeout=timeout)
  2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent   File "/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 412, in send
  2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent     return self._send(target, ctxt, message, wait_for_reply, timeout)
  2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent   File "/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 403, in _send
  2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent     result = self._waiter.wait(msg_id, timeout)
  2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent   File "/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 267, in wait
  2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent     reply, ending = self._poll_connection(msg_id, timeout)
  2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent   File "/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 217, in _poll_connection
  2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent     % msg_id)
  2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent MessagingTimeout: Timed out waiting for a reply to message ID 21fd392ebb3a43228447f380334d05b0
  2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent 
  2014-07-08 10:12:07.157 3905 WARNING neutron.openstack.common.loopingcall [-] task run outlasted interval by 30.119771 sec

  If I then wait for the undercloud to fail and login to the seed host, I can manually run the command that failed previously:
  root@ubuntu:/var/log/upstart# sudo /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf ip netns exec 'qdhcp-e967d587-5999-4333-8405-76b32398b121' ip link set tap4ac4d6ab-d7 up

  Replicate the ERROR message by attempting to exec a command for a non existent namespace..
  ------------------------------------------------------------------------------------------------------------------------------------
  root@ubuntu:/var/log/upstart# sudo /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf ip netns exec 'qdhcp-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' ip link set tap4ac4d6ab-d7 up
  Cannot open network namespace "qdhcp-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx": No such file or directory

  
  The code section that returns the error (neutron/agent/linux/ip_lib.py - see below) has a comment regarding the return of a RuntimeError if the device does not exist - so this could be seen as expected behaviour
  def ensure_device_is_ready(device_name, root_helper=None, namespace=None):
      dev = IPDevice(device_name, root_helper, namespace)
      try:
          # Ensure the device is up, even if it is already up. If the device
          # doesn't exist, a RuntimeError will be raised.
          dev.link.set_up()
      except RuntimeError:
          return False
      return True

  
   - why does the device not exist? (probably a timing issue)

   - why not wait until the device exists before attempting the operation?
      what check could be put in place - prior to calling this function (or as part of this function) that could check on the state of the device
      would a retry be appropriate?

   - Is the name of the function correct? - does not seem to
  'ensure_device_is_ready' - more like "set_device_interface_up"?

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1339045/+subscriptions


Follow ups

References