← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1715660] [NEW] fullstack job failing to create a namespace, hitting kernel deadlock

 

Public bug reported:

networking-bagpipe fullstack job hits the following kernel issue when a
new tests are added that use more netns's:

Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel: INFO: "task ip:1358 blocked for more than 120 seconds.
Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel:       Tainted: G           OE   4.4.0-93-generic #116-Ubuntu
Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel: ip              D ffff880166acfdc8     0  1358   1356 0x00000000
Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel:  ffff880166acfdc8 ffff880166acfd98 ffff880205a88000 ffff8800eb29d940
Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel:  ffff880166ad0000 ffffffff81ef78a4 ffff8800eb29d940 00000000ffffffff
Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel:  ffffffff81ef78a8 ffff880166acfde0 ffffffff8183f0d5 ffffffff81ef78a0
Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel: Call Trace:
Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel:  [<ffffffff8183f0d5>] schedule+0x35/0x80
Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel:  [<ffffffff8183f37e>] schedule_preempt_disabled+0xe/0x10
Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel:  [<ffffffff81840fb9>] __mutex_lock_slowpath+0xb9/0x130
Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel:  [<ffffffff8184104f>] mutex_lock+0x1f/0x30
Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel:  [<ffffffff8172da4e>] copy_net_ns+0x6e/0x120
Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel:  [<ffffffff810a174b>] create_new_namespaces+0x11b/0x1d0
Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel:  [<ffffffff810a198a>] unshare_nsproxy_namespaces+0x5a/0xb0
Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel:  [<ffffffff81080b41>] SyS_unshare+0x1f1/0x3a0
Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel:  [<ffffffff818431f2>] entry_SYSCALL_64_fastpath+0x16/0x71

( http://logs.openstack.org/66/500066/1/check/gate-networking-bagpipe-
dsvm-fullstack-ubuntu-xenial-nv/99f751d/logs/syslog.txt.gz )

(The command that is blocked is an "ip netns add ..." command.)

This happens in the openstack CI on ubuntu kernel 4.4.0-93-generic.


On another box (not openstack CI), this issue seems correlated with a lot of "unregister_netdevice: waiting for lo to become free. Usage count = X"  (with varying values for X: 1, 3, 6).

** Affects: networking-bagpipe
     Importance: Undecided
         Status: New

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1715660

Title:
  fullstack job failing to create a namespace, hitting kernel deadlock

Status in BaGPipe:
  New
Status in neutron:
  New

Bug description:
  networking-bagpipe fullstack job hits the following kernel issue when
  a new tests are added that use more netns's:

  Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel: INFO: "task ip:1358 blocked for more than 120 seconds.
  Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel:       Tainted: G           OE   4.4.0-93-generic #116-Ubuntu
  Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel: ip              D ffff880166acfdc8     0  1358   1356 0x00000000
  Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel:  ffff880166acfdc8 ffff880166acfd98 ffff880205a88000 ffff8800eb29d940
  Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel:  ffff880166ad0000 ffffffff81ef78a4 ffff8800eb29d940 00000000ffffffff
  Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel:  ffffffff81ef78a8 ffff880166acfde0 ffffffff8183f0d5 ffffffff81ef78a0
  Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel: Call Trace:
  Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel:  [<ffffffff8183f0d5>] schedule+0x35/0x80
  Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel:  [<ffffffff8183f37e>] schedule_preempt_disabled+0xe/0x10
  Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel:  [<ffffffff81840fb9>] __mutex_lock_slowpath+0xb9/0x130
  Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel:  [<ffffffff8184104f>] mutex_lock+0x1f/0x30
  Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel:  [<ffffffff8172da4e>] copy_net_ns+0x6e/0x120
  Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel:  [<ffffffff810a174b>] create_new_namespaces+0x11b/0x1d0
  Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel:  [<ffffffff810a198a>] unshare_nsproxy_namespaces+0x5a/0xb0
  Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel:  [<ffffffff81080b41>] SyS_unshare+0x1f1/0x3a0
  Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel:  [<ffffffff818431f2>] entry_SYSCALL_64_fastpath+0x16/0x71

  ( http://logs.openstack.org/66/500066/1/check/gate-networking-bagpipe-
  dsvm-fullstack-ubuntu-xenial-nv/99f751d/logs/syslog.txt.gz )

  (The command that is blocked is an "ip netns add ..." command.)

  This happens in the openstack CI on ubuntu kernel 4.4.0-93-generic.

  
  On another box (not openstack CI), this issue seems correlated with a lot of "unregister_netdevice: waiting for lo to become free. Usage count = X"  (with varying values for X: 1, 3, 6).

To manage notifications about this bug go to:
https://bugs.launchpad.net/networking-bagpipe/+bug/1715660/+subscriptions


Follow ups