yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #74996
[Bug 1795280] [NEW] netns deletion on newer kernels fails with errno 16
Public bug reported:
This is probably not neutron related, but need help with some input.
On a 3.10 kernel on CentOS 7.5 by simply creating a network and deleting
it properly terminates all processes, removes interfaces and deletes the
network namespace.
[root@controller ~]# uname -r
3.10.0-862.11.6.el7.x86_64
If running a later kernel like 4.18 there is some change that causes the
namespace deletion to cause a OSError errno 16 device or resource busy.
Before something like kernel 3.19 the netns filesystem was provided in proc but has since been moved
to it's own nsfs, maybe this has something to do with it, but I haven't seen this issue on Ubuntu before.
[root@controller ~]# mount | grep qdhcp
proc on /run/netns/qdhcp-51e47959-9a2b-4372-a204-aff75de9bd01 type proc (rw,nosuid,nodev,noexec,relatime)
proc on /run/netns/qdhcp-51e47959-9a2b-4372-a204-aff75de9bd01 type proc (rw,nosuid,nodev,noexec,relatime)
[root@controller ~]# uname -r
4.18.8-1.el7.elrepo.x86_64
nsfs on /run/netns/qdhcp-1fb24615-fd9e-4804-aade-5668bb2cdecb type nsfs (rw,seclabel)
nsfs on /run/netns/qdhcp-1fb24615-fd9e-4804-aade-5668bb2cdecb type nsfs (rw,seclabel)
Perhaps some CentOS or RedHat person can shime in about this.
Can reproduce this every single time:
* Create network, it spawns dnsmasq, haproxy and the interfaces in a netns
* Delete network, it will terminate all processes, delete interface but netns cannot be deleted and throws below error
Seen on both queens and rocky fwiw
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent [req-28a9e37f-a2ca-4375-a3f0-8384711414dd - - - - -] Unable to disable dhcp for 1fb24615-fd9e-4804-aade-5668bb2cdecb.: OSError: [Errno 16] Device or resource busy
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent Traceback (most recent call last):
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/neutron/agent/dhcp/agent.py", line 144, in call_driver
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent getattr(driver, action)(**action_kwargs)
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 241, in disable
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent self._destroy_namespace_and_port()
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 255, in _destroy_namespace_and_port
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent ip_lib.delete_network_namespace(self.network.namespace)
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 1105, in delete_network_namespace
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent privileged.remove_netns(namespace, **kwargs)
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_privsep/priv_context.py", line 207, in _wrap
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent return self.channel.remote_call(name, args, kwargs)
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_privsep/daemon.py", line 202, in remote_call
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent raise exc_type(*result[2])
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent OSError: [Errno 16] Device or resource busy
** Affects: neutron
Importance: Undecided
Status: New
** Description changed:
This is probably not neutron related, but need help with some input.
On a 3.10 kernel on CentOS 7.5 by simply creating a network and deleting
it properly terminates all processes, removes interfaces and deletes the
network namespace.
[root@controller ~]# uname -r
3.10.0-862.11.6.el7.x86_64
If running a later kernel like 4.18 there is some change that causes the
namespace deletion to cause a OSError errno 16 device or resource busy.
Before something like kernel 3.19 the netns filesystem was provided in proc but has since been moved
to it's own nsfs, maybe this has something to do with it, but I haven't seen this issue on Ubuntu before.
[root@controller ~]# mount | grep qdhcp
proc on /run/netns/qdhcp-51e47959-9a2b-4372-a204-aff75de9bd01 type proc (rw,nosuid,nodev,noexec,relatime)
proc on /run/netns/qdhcp-51e47959-9a2b-4372-a204-aff75de9bd01 type proc (rw,nosuid,nodev,noexec,relatime)
- [root@osc-network1-sto1-prod ~]# uname -r
+ [root@controller ~]# uname -r
4.18.8-1.el7.elrepo.x86_64
nsfs on /run/netns/qdhcp-1fb24615-fd9e-4804-aade-5668bb2cdecb type nsfs (rw,seclabel)
nsfs on /run/netns/qdhcp-1fb24615-fd9e-4804-aade-5668bb2cdecb type nsfs (rw,seclabel)
Perhaps some CentOS or RedHat person can shime in about this.
Can reproduce this every single time:
* Create network, it spawns dnsmasq, haproxy and the interfaces in a netns
* Delete network, it will terminate all processes, delete interface but netns cannot be deleted and throws below error
Seen on both queens and rocky fwiw
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent [req-28a9e37f-a2ca-4375-a3f0-8384711414dd - - - - -] Unable to disable dhcp for 1fb24615-fd9e-4804-aade-5668bb2cdecb.: OSError: [Errno 16] Device or resource busy
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent Traceback (most recent call last):
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/neutron/agent/dhcp/agent.py", line 144, in call_driver
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent getattr(driver, action)(**action_kwargs)
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 241, in disable
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent self._destroy_namespace_and_port()
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 255, in _destroy_namespace_and_port
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent ip_lib.delete_network_namespace(self.network.namespace)
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 1105, in delete_network_namespace
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent privileged.remove_netns(namespace, **kwargs)
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_privsep/priv_context.py", line 207, in _wrap
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent return self.channel.remote_call(name, args, kwargs)
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_privsep/daemon.py", line 202, in remote_call
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent raise exc_type(*result[2])
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent OSError: [Errno 16] Device or resource busy
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1795280
Title:
netns deletion on newer kernels fails with errno 16
Status in neutron:
New
Bug description:
This is probably not neutron related, but need help with some input.
On a 3.10 kernel on CentOS 7.5 by simply creating a network and
deleting it properly terminates all processes, removes interfaces and
deletes the network namespace.
[root@controller ~]# uname -r
3.10.0-862.11.6.el7.x86_64
If running a later kernel like 4.18 there is some change that causes
the namespace deletion to cause a OSError errno 16 device or resource
busy.
Before something like kernel 3.19 the netns filesystem was provided in proc but has since been moved
to it's own nsfs, maybe this has something to do with it, but I haven't seen this issue on Ubuntu before.
[root@controller ~]# mount | grep qdhcp
proc on /run/netns/qdhcp-51e47959-9a2b-4372-a204-aff75de9bd01 type proc (rw,nosuid,nodev,noexec,relatime)
proc on /run/netns/qdhcp-51e47959-9a2b-4372-a204-aff75de9bd01 type proc (rw,nosuid,nodev,noexec,relatime)
[root@controller ~]# uname -r
4.18.8-1.el7.elrepo.x86_64
nsfs on /run/netns/qdhcp-1fb24615-fd9e-4804-aade-5668bb2cdecb type nsfs (rw,seclabel)
nsfs on /run/netns/qdhcp-1fb24615-fd9e-4804-aade-5668bb2cdecb type nsfs (rw,seclabel)
Perhaps some CentOS or RedHat person can shime in about this.
Can reproduce this every single time:
* Create network, it spawns dnsmasq, haproxy and the interfaces in a netns
* Delete network, it will terminate all processes, delete interface but netns cannot be deleted and throws below error
Seen on both queens and rocky fwiw
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent [req-28a9e37f-a2ca-4375-a3f0-8384711414dd - - - - -] Unable to disable dhcp for 1fb24615-fd9e-4804-aade-5668bb2cdecb.: OSError: [Errno 16] Device or resource busy
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent Traceback (most recent call last):
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/neutron/agent/dhcp/agent.py", line 144, in call_driver
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent getattr(driver, action)(**action_kwargs)
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 241, in disable
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent self._destroy_namespace_and_port()
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 255, in _destroy_namespace_and_port
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent ip_lib.delete_network_namespace(self.network.namespace)
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 1105, in delete_network_namespace
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent privileged.remove_netns(namespace, **kwargs)
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_privsep/priv_context.py", line 207, in _wrap
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent return self.channel.remote_call(name, args, kwargs)
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_privsep/daemon.py", line 202, in remote_call
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent raise exc_type(*result[2])
2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent OSError: [Errno 16] Device or resource busy
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1795280/+subscriptions
Follow ups