← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1795280] [NEW] netns deletion on newer kernels fails with errno 16

 

Public bug reported:

This is probably not neutron related, but need help with some input.

On a 3.10 kernel on CentOS 7.5 by simply creating a network and deleting
it properly terminates all processes, removes interfaces and deletes the
network namespace.

[root@controller ~]# uname -r
3.10.0-862.11.6.el7.x86_64

If running a later kernel like 4.18 there is some change that causes the
namespace deletion to cause a OSError errno 16 device or resource busy.

Before something like kernel 3.19 the netns filesystem was provided in proc but has since been moved
to it's own nsfs, maybe this has something to do with it, but I haven't seen this issue on Ubuntu before.

[root@controller ~]# mount | grep qdhcp
proc on /run/netns/qdhcp-51e47959-9a2b-4372-a204-aff75de9bd01 type proc (rw,nosuid,nodev,noexec,relatime)
proc on /run/netns/qdhcp-51e47959-9a2b-4372-a204-aff75de9bd01 type proc (rw,nosuid,nodev,noexec,relatime)

[root@controller ~]# uname -r
4.18.8-1.el7.elrepo.x86_64

nsfs on /run/netns/qdhcp-1fb24615-fd9e-4804-aade-5668bb2cdecb type nsfs (rw,seclabel)
nsfs on /run/netns/qdhcp-1fb24615-fd9e-4804-aade-5668bb2cdecb type nsfs (rw,seclabel)

Perhaps some CentOS or RedHat person can shime in about this.

Can reproduce this every single time:
* Create network, it spawns dnsmasq, haproxy and the interfaces in a netns
* Delete network, it will terminate all processes, delete interface but netns cannot be deleted and throws below error

Seen on both queens and rocky fwiw

2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent [req-28a9e37f-a2ca-4375-a3f0-8384711414dd - - - - -] Unable to disable dhcp for 1fb24615-fd9e-4804-aade-5668bb2cdecb.: OSError: [Errno 16] Device or resource busy
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent Traceback (most recent call last):
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/dhcp/agent.py", line 144, in call_driver
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent     getattr(driver, action)(**action_kwargs)
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 241, in disable
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent     self._destroy_namespace_and_port()
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 255, in _destroy_namespace_and_port
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent     ip_lib.delete_network_namespace(self.network.namespace)
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 1105, in delete_network_namespace
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent     privileged.remove_netns(namespace, **kwargs)
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/oslo_privsep/priv_context.py", line 207, in _wrap
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent     return self.channel.remote_call(name, args, kwargs)
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/oslo_privsep/daemon.py", line 202, in remote_call
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent     raise exc_type(*result[2])
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent OSError: [Errno 16] Device or resource busy

** Affects: neutron
     Importance: Undecided
         Status: New

** Description changed:

  This is probably not neutron related, but need help with some input.
  
  On a 3.10 kernel on CentOS 7.5 by simply creating a network and deleting
  it properly terminates all processes, removes interfaces and deletes the
  network namespace.
  
  [root@controller ~]# uname -r
  3.10.0-862.11.6.el7.x86_64
  
  If running a later kernel like 4.18 there is some change that causes the
  namespace deletion to cause a OSError errno 16 device or resource busy.
  
  Before something like kernel 3.19 the netns filesystem was provided in proc but has since been moved
  to it's own nsfs, maybe this has something to do with it, but I haven't seen this issue on Ubuntu before.
  
  [root@controller ~]# mount | grep qdhcp
  proc on /run/netns/qdhcp-51e47959-9a2b-4372-a204-aff75de9bd01 type proc (rw,nosuid,nodev,noexec,relatime)
  proc on /run/netns/qdhcp-51e47959-9a2b-4372-a204-aff75de9bd01 type proc (rw,nosuid,nodev,noexec,relatime)
  
- [root@osc-network1-sto1-prod ~]# uname -r
+ [root@controller ~]# uname -r
  4.18.8-1.el7.elrepo.x86_64
  
  nsfs on /run/netns/qdhcp-1fb24615-fd9e-4804-aade-5668bb2cdecb type nsfs (rw,seclabel)
  nsfs on /run/netns/qdhcp-1fb24615-fd9e-4804-aade-5668bb2cdecb type nsfs (rw,seclabel)
  
  Perhaps some CentOS or RedHat person can shime in about this.
  
  Can reproduce this every single time:
  * Create network, it spawns dnsmasq, haproxy and the interfaces in a netns
  * Delete network, it will terminate all processes, delete interface but netns cannot be deleted and throws below error
  
  Seen on both queens and rocky fwiw
  
  2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent [req-28a9e37f-a2ca-4375-a3f0-8384711414dd - - - - -] Unable to disable dhcp for 1fb24615-fd9e-4804-aade-5668bb2cdecb.: OSError: [Errno 16] Device or resource busy
  2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent Traceback (most recent call last):
  2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/dhcp/agent.py", line 144, in call_driver
  2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent     getattr(driver, action)(**action_kwargs)
  2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 241, in disable
  2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent     self._destroy_namespace_and_port()
  2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 255, in _destroy_namespace_and_port
  2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent     ip_lib.delete_network_namespace(self.network.namespace)
  2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 1105, in delete_network_namespace
  2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent     privileged.remove_netns(namespace, **kwargs)
  2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/oslo_privsep/priv_context.py", line 207, in _wrap
  2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent     return self.channel.remote_call(name, args, kwargs)
  2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/oslo_privsep/daemon.py", line 202, in remote_call
  2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent     raise exc_type(*result[2])
  2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent OSError: [Errno 16] Device or resource busy

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1795280

Title:
  netns deletion on newer kernels fails with errno 16

Status in neutron:
  New

Bug description:
  This is probably not neutron related, but need help with some input.

  On a 3.10 kernel on CentOS 7.5 by simply creating a network and
  deleting it properly terminates all processes, removes interfaces and
  deletes the network namespace.

  [root@controller ~]# uname -r
  3.10.0-862.11.6.el7.x86_64

  If running a later kernel like 4.18 there is some change that causes
  the namespace deletion to cause a OSError errno 16 device or resource
  busy.

  Before something like kernel 3.19 the netns filesystem was provided in proc but has since been moved
  to it's own nsfs, maybe this has something to do with it, but I haven't seen this issue on Ubuntu before.

  [root@controller ~]# mount | grep qdhcp
  proc on /run/netns/qdhcp-51e47959-9a2b-4372-a204-aff75de9bd01 type proc (rw,nosuid,nodev,noexec,relatime)
  proc on /run/netns/qdhcp-51e47959-9a2b-4372-a204-aff75de9bd01 type proc (rw,nosuid,nodev,noexec,relatime)

  [root@controller ~]# uname -r
  4.18.8-1.el7.elrepo.x86_64

  nsfs on /run/netns/qdhcp-1fb24615-fd9e-4804-aade-5668bb2cdecb type nsfs (rw,seclabel)
  nsfs on /run/netns/qdhcp-1fb24615-fd9e-4804-aade-5668bb2cdecb type nsfs (rw,seclabel)

  Perhaps some CentOS or RedHat person can shime in about this.

  Can reproduce this every single time:
  * Create network, it spawns dnsmasq, haproxy and the interfaces in a netns
  * Delete network, it will terminate all processes, delete interface but netns cannot be deleted and throws below error

  Seen on both queens and rocky fwiw

  2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent [req-28a9e37f-a2ca-4375-a3f0-8384711414dd - - - - -] Unable to disable dhcp for 1fb24615-fd9e-4804-aade-5668bb2cdecb.: OSError: [Errno 16] Device or resource busy
  2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent Traceback (most recent call last):
  2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/dhcp/agent.py", line 144, in call_driver
  2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent     getattr(driver, action)(**action_kwargs)
  2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 241, in disable
  2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent     self._destroy_namespace_and_port()
  2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 255, in _destroy_namespace_and_port
  2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent     ip_lib.delete_network_namespace(self.network.namespace)
  2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 1105, in delete_network_namespace
  2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent     privileged.remove_netns(namespace, **kwargs)
  2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/oslo_privsep/priv_context.py", line 207, in _wrap
  2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent     return self.channel.remote_call(name, args, kwargs)
  2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/oslo_privsep/daemon.py", line 202, in remote_call
  2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent     raise exc_type(*result[2])
  2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent OSError: [Errno 16] Device or resource busy

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1795280/+subscriptions


Follow ups