← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1561695] Re: neutron-dhcp-agent generates thousands of interfaces on a failure

 

Reviewed:  https://review.openstack.org/482427
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=38d058c2cf0746e2452a0c2c704c914c836de9e7
Submitter: Jenkins
Branch:    master

commit 38d058c2cf0746e2452a0c2c704c914c836de9e7
Author: Dongcan Ye <hellochosen@xxxxxxxxx>
Date:   Tue Jul 11 15:15:23 2017 +0800

    Fix generation of thousands of DHCP tap interfaces
    
    As reported in the bug, there may be an case where an empty
    namespace file in /run/netns, but the namespace not
    actually exist. In such case the DHCP agent throws an error
    when pluging the interface in the dhcp namespace.
    This may also result in many tap interfaces
    getting generated in OVS bridge or Linux bridge.
    
    This patch fixes the above bug by unpluging the tap device
    in the bridge if exception occurs, this can prevents the tap
    interfaces generate.
    
    Co-Authored-By: Brian Haley <bhaley@xxxxxxxxxx>
    
    Change-Id: I4a197edd180887ad36317ddb2f0c0e7bd2e34e30
    Closes-Bug: #1561695


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1561695

Title:
  neutron-dhcp-agent generates thousands of interfaces on a failure

Status in neutron:
  Fix Released

Bug description:
  I ran into slowness on a new deploy of mitaka-rc1 code with neutron. I
  had ~13,000 tap devices that were created by dhcp-agent. The neutron
  database does not have these ports. As far as I can tell, neutron is
  no longer aware, or cares about those ports but they remain on the
  node (and in OpenVSwitch so a reboot wouldnt clear them).

  I do not know how the initial failure happened, but to reproduce this
  you can do the following:

  1. Stop dhcp agent (and anything using the network namespace).
  2. ip netns del qdhcp-8e5d7a66-df5d-4e36-8446-3c2148e53f02
  3. touch /run/netns/qdhcp-8e5d7a66-df5d-4e36-8446-3c2148e53f02
  4 Start the dhcp agent and watch it continually try to create (and then fail to cleanup) tap interfaces

  Over the course of ~4 hours this issue generate 13,000 interfaces and
  4GB of logs (debug was turned on). How the initial issue came about I
  do not know but it did happen in normal usage. I believe the proper
  fix here would be _always_ clean up tap devices even on failures but I
  am not familiar with the neutron code enough to fix this.

  The output of `ip netns` when it has an invalid namespace looks like
  this:

  # ip netns
  RTNETLINK answers: Invalid argument
  RTNETLINK answers: Invalid argument
  qdhcp-8e5d7a66-df5d-4e36-8446-3c2148e53f02

  The stack trace in neutron-dhcp-agent is:

  2016-03-24 18:42:12.165 1 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ovs-vsctl', '--timeout=10', '--oneline', '--format=json', '--', '--columns=ofport', 'list', 'Interface', 'tap42983a07-e0'] create_process /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:84
  2016-03-24 18:42:12.275 1 DEBUG neutron.agent.linux.utils [-] Exit code: 0 execute /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:142
  2016-03-24 18:42:12.276 1 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'link', 'set', 'tap42983a07-e0', 'address', 'fa:16:3e:79:1b:0a'] create_process /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:84
  2016-03-24 18:42:12.384 1 DEBUG neutron.agent.linux.utils [-] Exit code: 0 execute /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:142
  2016-03-24 18:42:12.385 1 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'link', 'set', 'tap42983a07-e0', 'mtu', '9000'] create_process /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:84
  2016-03-24 18:42:12.495 1 DEBUG neutron.agent.linux.utils [-] Exit code: 0 execute /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:142
  2016-03-24 18:42:12.496 1 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', '-o', 'netns', 'list'] create_process /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:84
  2016-03-24 18:42:12.604 1 DEBUG neutron.agent.linux.utils [-] Exit code: 0 execute /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:142
  2016-03-24 18:42:12.605 1 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'link', 'set', 'tap42983a07-e0', 'netns', 'qdhcp-8e5d7a66-df5d-4e36-8446-3c2148e53f02'] create_process /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:84
  2016-03-24 18:42:12.709 1 ERROR neutron.agent.linux.utils [-] Exit code: 2; Stdin: ; Stdout: ; Stderr: RTNETLINK answers: Invalid argument

  2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp [-] Unable to plug DHCP port for network 8e5d7a66-df5d-4e36-8446-3c2148e53f02. Releasing port.
  2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp Traceback (most recent call last):
  2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 1234, in setup
  2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp     mtu=network.get('mtu'))
  2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/interface.py", line 248, in plug
  2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp     bridge, namespace, prefix, mtu)
  2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/interface.py", line 346, in plug_new
  2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp     namespace_obj.add_device_to_namespace(ns_dev)
  2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 216, in add_device_to_namespace
  2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp     device.link.set_netns(self.namespace)
  2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 514, in set_netns
  2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp     self._as_root([], ('set', self.name, 'netns', namespace))
  2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 365, in _as_root
  2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp     use_root_namespace=use_root_namespace)
  2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 95, in _as_root
  2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp     log_fail_as_error=self.log_fail_as_error)
  2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 104, in _execute
  2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp     log_fail_as_error=log_fail_as_error)
  2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py", line 140, in execute
  2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp     raise RuntimeError(msg)
  2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp RuntimeError: Exit code: 2; Stdin: ; Stdout: ; Stderr: RTNETLINK answers: Invalid argument
  2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp 
  2016-03-24 18:42:12.710 1 ERROR neutron.agent.linux.dhcp 
  2016-03-24 18:42:12.711 1 DEBUG oslo_messaging._drivers.amqpdriver [-] CALL msg_id: 559dc40172904849a6cda4efebd85c38 exchange 'neutron' topic 'q-plugin' _send /var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:454
  2016-03-24 18:42:12.858 1 DEBUG oslo_messaging._drivers.amqpdriver [-] received reply msg_id: 559dc40172904849a6cda4efebd85c38 __call__ /var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:302
  2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent [-] Unable to enable dhcp for 8e5d7a66-df5d-4e36-8446-3c2148e53f02.
  2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent Traceback (most recent call last):
  2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/dhcp/agent.py", line 112, in call_driver
  2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent     getattr(driver, action)(**action_kwargs)
  2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 208, in enable
  2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent     interface_name = self.device_manager.setup(self.network)
  2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 1240, in setup
  2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent     self.plugin.release_dhcp_port(network.id, port.device_id)
  2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
  2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent     self.force_reraise()
  2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
  2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent     six.reraise(self.type_, self.value, self.tb)
  2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 1234, in setup
  2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent     mtu=network.get('mtu'))
  2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/interface.py", line 248, in plug
  2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent     bridge, namespace, prefix, mtu)
  2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/interface.py", line 346, in plug_new
  2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent     namespace_obj.add_device_to_namespace(ns_dev)
  2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 216, in add_device_to_namespace
  2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent     device.link.set_netns(self.namespace)
  2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 514, in set_netns
  2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent     self._as_root([], ('set', self.name, 'netns', namespace))
  2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 365, in _as_root
  2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent     use_root_namespace=use_root_namespace)
  2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 95, in _as_root
  2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent     log_fail_as_error=self.log_fail_as_error)
  2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 104, in _execute
  2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent     log_fail_as_error=log_fail_as_error)
  2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py", line 140, in execute
  2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent     raise RuntimeError(msg)
  2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent RuntimeError: Exit code: 2; Stdin: ; Stdout: ; Stderr: RTNETLINK answers: Invalid argument
  2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent 
  2016-03-24 18:42:12.859 1 ERROR neutron.agent.dhcp.agent 
  2016-03-24 18:42:12.859 1 INFO neutron.agent.dhcp.agent [-] Finished network 8e5d7a66-df5d-4e36-8446-3c2148e53f02 dhcp configuration
  2016-03-24 18:42:12.859 1 INFO neutron.agent.dhcp.agent [-] Synchronizing state complete
  2016-03-24 18:42:12.859 1 DEBUG oslo_concurrency.lockutils [-] Lock "dhcp-agent" released by "neutron.agent.dhcp.agent.sync_state" :: held 1.626s inner /var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:282

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1561695/+subscriptions


References