← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1715660] Re: fullstack job failing to create a namespace, hitting kernel deadlock

 

The ""unregister_netdevice: waiting for lo to become free" logs are also
present in the Openstack CI in many kolla jobs, although these jobs
don't have the kernel "task ... blocked for more than 120 seconds"
message. Kolla may be hitting a different issue, or the same issue but
the deadlock resolving before the 120s limit.

** Also affects: networking-bagpipe
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1715660

Title:
  fullstack job failing to create a namespace, hitting kernel deadlock

Status in BaGPipe:
  New
Status in neutron:
  New

Bug description:
  networking-bagpipe fullstack job hits the following kernel issue when
  a new tests are added that use more netns's:

  Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel: INFO: "task ip:1358 blocked for more than 120 seconds.
  Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel:       Tainted: G           OE   4.4.0-93-generic #116-Ubuntu
  Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel: ip              D ffff880166acfdc8     0  1358   1356 0x00000000
  Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel:  ffff880166acfdc8 ffff880166acfd98 ffff880205a88000 ffff8800eb29d940
  Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel:  ffff880166ad0000 ffffffff81ef78a4 ffff8800eb29d940 00000000ffffffff
  Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel:  ffffffff81ef78a8 ffff880166acfde0 ffffffff8183f0d5 ffffffff81ef78a0
  Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel: Call Trace:
  Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel:  [<ffffffff8183f0d5>] schedule+0x35/0x80
  Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel:  [<ffffffff8183f37e>] schedule_preempt_disabled+0xe/0x10
  Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel:  [<ffffffff81840fb9>] __mutex_lock_slowpath+0xb9/0x130
  Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel:  [<ffffffff8184104f>] mutex_lock+0x1f/0x30
  Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel:  [<ffffffff8172da4e>] copy_net_ns+0x6e/0x120
  Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel:  [<ffffffff810a174b>] create_new_namespaces+0x11b/0x1d0
  Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel:  [<ffffffff810a198a>] unshare_nsproxy_namespaces+0x5a/0xb0
  Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel:  [<ffffffff81080b41>] SyS_unshare+0x1f1/0x3a0
  Sep 01 14:48:40 ubuntu-xenial-rax-dfw-10739585 kernel:  [<ffffffff818431f2>] entry_SYSCALL_64_fastpath+0x16/0x71

  ( http://logs.openstack.org/66/500066/1/check/gate-networking-bagpipe-
  dsvm-fullstack-ubuntu-xenial-nv/99f751d/logs/syslog.txt.gz )

  (The command that is blocked is an "ip netns add ..." command.)

  This happens in the openstack CI on ubuntu kernel 4.4.0-93-generic.

  
  On another box (not openstack CI), this issue seems correlated with a lot of "unregister_netdevice: waiting for lo to become free. Usage count = X"  (with varying values for X: 1, 3, 6).

To manage notifications about this bug go to:
https://bugs.launchpad.net/networking-bagpipe/+bug/1715660/+subscriptions


References