yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #50580
[Bug 1339045] Re: Cannot open network namespace
[Expired for neutron because there has been no activity for 60 days.]
** Changed in: neutron
Status: Incomplete => Expired
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1339045
Title:
Cannot open network namespace
Status in neutron:
Expired
Bug description:
While booting tripleo undercloud have occassionally (about 8 in 10
attempts) seen errors in neutron-dhcp-agent.log of:
2014-07-08 10:12:07.374 3905 ERROR neutron.agent.linux.utils [-]
Command: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'qdhcp-e967d587-5999-4333-8405-76b32398b121', 'ip', 'link', 'set', 'tap4ac4d6ab-d7', 'up']
Exit code: 1
Stdout: ''
Stderr: 'Cannot open network namespace "qdhcp-e967d587-5999-4333-8405-76b32398b121": No such file or directory\n'
2014-07-08 10:12:07.419 3905 ERROR neutron.agent.linux.utils [-]
Command: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'qdhcp-e967d587-5999-4333-8405-76b32398b121', 'ip', '-o', 'link', 'show', 'tap4ac4d6ab-d7']
Exit code: 1
Stdout: ''
Stderr: 'Cannot open network namespace "qdhcp-e967d587-5999-4333-8405-76b32398b121": No such file or directory\n'
In all cases these ERRORs are preceeded by the ERROR:
2014-07-08 10:12:07.151 3905 ERROR neutron.agent.dhcp_agent [req-523b2309-18c9-4a2d-ad65-7586db10f956 None] Failed reporting state!
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent Traceback (most recent call last):
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent File "/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/neutron/agent/dhcp_agent.py", line 576, in _report_state
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent self.state_rpc.report_state(ctx, self.agent_state, self.use_call)
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent File "/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/neutron/agent/rpc.py", line 70, in report_state
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent return self.call(context, msg, topic=self.topic)
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent File "/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/neutron/common/rpc.py", line 161, in call
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent context, msg, rpc_method='call', **kwargs)
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent File "/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/neutron/common/rpc.py", line 185, in __call_rpc_method
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent return func(context, msg['method'], **msg['args'])
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent File "/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/oslo/messaging/rpc/client.py", line 150, in call
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent wait_for_reply=True, timeout=timeout)
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent File "/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/oslo/messaging/transport.py", line 90, in _send
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent timeout=timeout)
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent File "/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 412, in send
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent return self._send(target, ctxt, message, wait_for_reply, timeout)
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent File "/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 403, in _send
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent result = self._waiter.wait(msg_id, timeout)
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent File "/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 267, in wait
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent reply, ending = self._poll_connection(msg_id, timeout)
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent File "/opt/stack/venvs/neutron/local/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 217, in _poll_connection
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent % msg_id)
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent MessagingTimeout: Timed out waiting for a reply to message ID 21fd392ebb3a43228447f380334d05b0
2014-07-08 10:12:07.151 3905 TRACE neutron.agent.dhcp_agent
2014-07-08 10:12:07.157 3905 WARNING neutron.openstack.common.loopingcall [-] task run outlasted interval by 30.119771 sec
If I then wait for the undercloud to fail and login to the seed host, I can manually run the command that failed previously:
root@ubuntu:/var/log/upstart# sudo /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf ip netns exec 'qdhcp-e967d587-5999-4333-8405-76b32398b121' ip link set tap4ac4d6ab-d7 up
Replicate the ERROR message by attempting to exec a command for a non existent namespace..
------------------------------------------------------------------------------------------------------------------------------------
root@ubuntu:/var/log/upstart# sudo /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf ip netns exec 'qdhcp-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' ip link set tap4ac4d6ab-d7 up
Cannot open network namespace "qdhcp-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx": No such file or directory
The code section that returns the error (neutron/agent/linux/ip_lib.py - see below) has a comment regarding the return of a RuntimeError if the device does not exist - so this could be seen as expected behaviour
def ensure_device_is_ready(device_name, root_helper=None, namespace=None):
dev = IPDevice(device_name, root_helper, namespace)
try:
# Ensure the device is up, even if it is already up. If the device
# doesn't exist, a RuntimeError will be raised.
dev.link.set_up()
except RuntimeError:
return False
return True
- why does the device not exist? (probably a timing issue)
- why not wait until the device exists before attempting the operation?
what check could be put in place - prior to calling this function (or as part of this function) that could check on the state of the device
would a retry be appropriate?
- Is the name of the function correct? - does not seem to
'ensure_device_is_ready' - more like "set_device_interface_up"?
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1339045/+subscriptions
References