yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #67284
[Bug 1715660] Re: fullstack job failing to create a namespace, hitting kernel deadlock
The ""unregister_netdevice: waiting for lo to become free" logs are also
present in the Openstack CI in many kolla jobs, although these jobs
don't have the kernel "task ... blocked for more than 120 seconds"
message. Kolla may be hitting a different issue, or the same issue but
the deadlock resolving before the 120s limit.
** Also affects: networking-bagpipe
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1715660
Title:
fullstack job failing to create a namespace, hitting kernel deadlock
Status in BaGPipe:
New
Status in neutron:
New
Bug description:
networking-bagpipe fullstack job hits the following kernel issue when
a new tests are added that use more netns's:
Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel: INFO: "task ip:1358 blocked for more than 120 seconds.
Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel: Tainted: G OE 4.4.0-93-generic #116-Ubuntu
Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel: ip D ffff880166acfdc8 0 1358 1356 0x00000000
Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel: ffff880166acfdc8 ffff880166acfd98 ffff880205a88000 ffff8800eb29d940
Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel: ffff880166ad0000 ffffffff81ef78a4 ffff8800eb29d940 00000000ffffffff
Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel: ffffffff81ef78a8 ffff880166acfde0 ffffffff8183f0d5 ffffffff81ef78a0
Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel: Call Trace:
Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel: [<ffffffff8183f0d5>] schedule+0x35/0x80
Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel: [<ffffffff8183f37e>] schedule_preempt_disabled+0xe/0x10
Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel: [<ffffffff81840fb9>] __mutex_lock_slowpath+0xb9/0x130
Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel: [<ffffffff8184104f>] mutex_lock+0x1f/0x30
Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel: [<ffffffff8172da4e>] copy_net_ns+0x6e/0x120
Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel: [<ffffffff810a174b>] create_new_namespaces+0x11b/0x1d0
Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel: [<ffffffff810a198a>] unshare_nsproxy_namespaces+0x5a/0xb0
Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel: [<ffffffff81080b41>] SyS_unshare+0x1f1/0x3a0
Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel: [<ffffffff818431f2>] entry_SYSCALL_64_fastpath+0x16/0x71
( http://logs.openstack.org/66/500066/1/check/gate-networking-bagpipe-
dsvm-fullstack-ubuntu-xenial-nv/99f751d/logs/syslog.txt.gz )
(The command that is blocked is an "ip netns add ..." command.)
This happens in the openstack CI on ubuntu kernel 4.4.0-93-generic.
On another box (not openstack CI), this issue seems correlated with a lot of "unregister_netdevice: waiting for lo to become free. Usage count = X" (with varying values for X: 1, 3, 6).
To manage notifications about this bug go to:
https://bugs.launchpad.net/networking-bagpipe/+bug/1715660/+subscriptions
References