yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #67346
[Bug 1561695] Re: neutron-dhcp-agent generates thousands of interfaces on a failure
Reviewed: https://review.openstack.org/482427
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=38d058c2cf0746e2452a0c2c704c914c836de9e7
Submitter: Jenkins
Branch: master
commit 38d058c2cf0746e2452a0c2c704c914c836de9e7
Author: Dongcan Ye <hellochosen@xxxxxxxxx>
Date: Tue Jul 11 15:15:23 2017 +0800
Fix generation of thousands of DHCP tap interfaces
As reported in the bug, there may be an case where an empty
namespace file in /run/netns, but the namespace not
actually exist. In such case the DHCP agent throws an error
when pluging the interface in the dhcp namespace.
This may also result in many tap interfaces
getting generated in OVS bridge or Linux bridge.
This patch fixes the above bug by unpluging the tap device
in the bridge if exception occurs, this can prevents the tap
interfaces generate.
Co-Authored-By: Brian Haley <bhaley@xxxxxxxxxx>
Change-Id: I4a197edd180887ad36317ddb2f0c0e7bd2e34e30
Closes-Bug: #1561695
** Changed in: neutron
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1561695
Title:
neutron-dhcp-agent generates thousands of interfaces on a failure
Status in neutron:
Fix Released
Bug description:
I ran into slowness on a new deploy of mitaka-rc1 code with neutron. I
had ~13,000 tap devices that were created by dhcp-agent. The neutron
database does not have these ports. As far as I can tell, neutron is
no longer aware, or cares about those ports but they remain on the
node (and in OpenVSwitch so a reboot wouldnt clear them).
I do not know how the initial failure happened, but to reproduce this
you can do the following:
1. Stop dhcp agent (and anything using the network namespace).
2. ip netns del qdhcp-8e5d7a66-df5d-4e36-8446-3c2148e53f02
3. touch /run/netns/qdhcp-8e5d7a66-df5d-4e36-8446-3c2148e53f02
4 Start the dhcp agent and watch it continually try to create (and then fail to cleanup) tap interfaces
Over the course of ~4 hours this issue generate 13,000 interfaces and
4GB of logs (debug was turned on). How the initial issue came about I
do not know but it did happen in normal usage. I believe the proper
fix here would be _always_ clean up tap devices even on failures but I
am not familiar with the neutron code enough to fix this.
The output of `ip netns` when it has an invalid namespace looks like
this:
# ip netns
RTNETLINK answers: Invalid argument
RTNETLINK answers: Invalid argument
qdhcp-8e5d7a66-df5d-4e36-8446-3c2148e53f02
The stack trace in neutron-dhcp-agent is:
2016-03-24 18:42:12.165 1 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ovs-vsctl', '--timeout=10', '--oneline', '--format=json', '--', '--columns=ofport', 'list', 'Interface', 'tap42983a07-e0'] create_process /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:84
2016-03-24 18:42:12.275 1 DEBUG neutron.agent.linux.utils [-] Exit code: 0 execute /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:142
2016-03-24 18:42:12.276 1 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'link', 'set', 'tap42983a07-e0', 'address', 'fa:16:3e:79:1b:0a'] create_process /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:84
2016-03-24 18:42:12.384 1 DEBUG neutron.agent.linux.utils [-] Exit code: 0 execute /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:142
2016-03-24 18:42:12.385 1 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'link', 'set', 'tap42983a07-e0', 'mtu', '9000'] create_process /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:84
2016-03-24 18:42:12.495 1 DEBUG neutron.agent.linux.utils [-] Exit code: 0 execute /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:142
2016-03-24 18:42:12.496 1 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', '-o', 'netns', 'list'] create_process /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:84
2016-03-24 18:42:12.604 1 DEBUG neutron.agent.linux.utils [-] Exit code: 0 execute /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:142
2016-03-24 18:42:12.605 1 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'link', 'set', 'tap42983a07-e0', 'netns', 'qdhcp-8e5d7a66-df5d-4e36-8446-3c2148e53f02'] create_process /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:84
2016-03-24 18:42:12.709 1 ERROR neutron.agent.linux.utils [-] Exit code: 2; Stdin: ; Stdout: ; Stderr: RTNETLINK answers: Invalid argument
2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp [-] Unable to plug DHCP port for network 8e5d7a66-df5d-4e36-8446-3c2148e53f02. Releasing port.
2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp Traceback (most recent call last):
2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 1234, in setup
2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp mtu=network.get('mtu'))
2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/interface.py", line 248, in plug
2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp bridge, namespace, prefix, mtu)
2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/interface.py", line 346, in plug_new
2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp namespace_obj.add_device_to_namespace(ns_dev)
2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 216, in add_device_to_namespace
2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp device.link.set_netns(self.namespace)
2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 514, in set_netns
2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp self._as_root([], ('set', self.name, 'netns', namespace))
2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 365, in _as_root
2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp use_root_namespace=use_root_namespace)
2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 95, in _as_root
2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp log_fail_as_error=self.log_fail_as_error)
2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 104, in _execute
2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp log_fail_as_error=log_fail_as_error)
2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py", line 140, in execute
2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp raise RuntimeError(msg)
2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp RuntimeError: Exit code: 2; Stdin: ; Stdout: ; Stderr: RTNETLINK answers: Invalid argument
2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp
2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp
2016-03-24 18:42:12.711 1 DEBUG oslo_messaging._drivers.amqpdriver [-] CALL msg_id: 559dc40172904849a6cda4efebd85c38 exchange 'neutron' topic 'q-plugin' _send /var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:454
2016-03-24 18:42:12.858 1 DEBUG oslo_messaging._drivers.amqpdriver [-] received reply msg_id: 559dc40172904849a6cda4efebd85c38 __call__ /var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:302
2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent [-] Unable to enable dhcp for 8e5d7a66-df5d-4e36-8446-3c2148e53f02.
2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent Traceback (most recent call last):
2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/dhcp/agent.py", line 112, in call_driver
2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent getattr(driver, action)(**action_kwargs)
2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 208, in enable
2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent interface_name = self.device_manager.setup(self.network)
2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 1240, in setup
2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent self.plugin.release_dhcp_port(network.id, port.device_id)
2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent self.force_reraise()
2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent six.reraise(self.type_, self.value, self.tb)
2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 1234, in setup
2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent mtu=network.get('mtu'))
2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/interface.py", line 248, in plug
2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent bridge, namespace, prefix, mtu)
2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/interface.py", line 346, in plug_new
2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent namespace_obj.add_device_to_namespace(ns_dev)
2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 216, in add_device_to_namespace
2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent device.link.set_netns(self.namespace)
2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 514, in set_netns
2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent self._as_root([], ('set', self.name, 'netns', namespace))
2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 365, in _as_root
2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent use_root_namespace=use_root_namespace)
2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 95, in _as_root
2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent log_fail_as_error=self.log_fail_as_error)
2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 104, in _execute
2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent log_fail_as_error=log_fail_as_error)
2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py", line 140, in execute
2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent raise RuntimeError(msg)
2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent RuntimeError: Exit code: 2; Stdin: ; Stdout: ; Stderr: RTNETLINK answers: Invalid argument
2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent
2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent
2016-03-24 18:42:12.859 1 INFO neutron.agent.dhcp.agent [-] Finished network 8e5d7a66-df5d-4e36-8446-3c2148e53f02 dhcp configuration
2016-03-24 18:42:12.859 1 INFO neutron.agent.dhcp.agent [-] Synchronizing state complete
2016-03-24 18:42:12.859 1 DEBUG oslo_concurrency.lockutils [-] Lock "dhcp-agent" released by "neutron.agent.dhcp.agent.sync_state" :: held 1.626s inner /var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:282
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1561695/+subscriptions
References