kernel-packages team mailing list archive

Thread
Date

[Bug 1403152] Re: unregister_netdevice: waiting for lo to become free. Usage count

To: kernel-packages@xxxxxxxxxxxxxxxxxxx
From: Joe Stringer <1403152@xxxxxxxxxxxxxxxxxx>
Date: Wed, 17 Jun 2015 14:06:02 -0000
Reply-to: Bug 1403152 <1403152@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Just chiming in here, I contacted Rodrigo off-list and was verging
towards that same patch. More below.

I suspect there's two issues here with very similar symptoms. In
particular post #8 which mentions people reporting that 3.14 improves
the situation.
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1403152/comments/8

I've been chasing a bug in 3.13 with docker containers and connection
tracking which is fixed in 3.14, by this patch:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e53376bef2cd97d3e3f61fdc677fb8da7d03d0da

Note that the commit message for the above commit fixes a different
issue, but I've been able to produce issues of the nature in this thread
(hung docker / ip netns add commands like in post #6) before applying
this patch, but cannot reproduce after.
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1403152/comments/6

In the issue that I face, I can find a kworker thread using up an entire
core, and when I cat /proc/$pid/stack I see this:

<ffffffffbe01e9b6>] ___preempt_schedule+0x56/0xb0
[<ffffffffc02223e4>] nf_ct_iterate_cleanup+0x134/0x160 [nf_conntrack]
[<ffffffffc0223dae>] nf_conntrack_cleanup_net_list+0x4e/0x170
[nf_conntrack]
[<ffffffffc022436d>] nf_conntrack_pernet_exit+0x4d/0x60 [nf_conntrack]
[<ffffffffbe6040d3>] ops_exit_list.isra.1+0x53/0x60
[<ffffffffbe6048d0>] cleanup_net+0x100/0x1d0
[<ffffffffbe084991>] process_one_work+0x171/0x470
[<ffffffffbe08563b>] worker_thread+0x11b/0x3a0
[<ffffffffbe08bb82>] kthread+0xd2/0xf0
[<ffffffffbe71757c>] ret_from_fork+0x7c/0xb0
[<ffffffffffffffff>] 0xffffffffffffffff

The kworker is looping forever and failing to clean up conntrack state.
All the while, it holds the global netns lock. Given that I've bisected
to the commit linked above which is to do with refcounting, I suspect
that borked refcounting on conntrack entries makes them impossible to
properly free/destroy, which prevents this worker from cleaning up the
namespace, which then goes on to prevent anything else from interacting
with namespaces (add/delete/etc).

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1403152

Title:
  unregister_netdevice: waiting for lo to become free. Usage count

Status in The Linux Kernel:
  Unknown
Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Trusty:
  Confirmed
Status in linux source package in Utopic:
  Confirmed

Bug description:
  I currently running trusty latest patches and i get on these hardware
  and software:

  Ubuntu 3.13.0-43.72-generic 3.13.11.11

  processor	: 7
  vendor_id	: GenuineIntel
  cpu family	: 6
  model		: 77
  model name	: Intel(R) Atom(TM) CPU  C2758  @ 2.40GHz
  stepping	: 8
  microcode	: 0x11d
  cpu MHz		: 2400.000
  cache size	: 1024 KB
  physical id	: 0
  siblings	: 8
  core id		: 7
  cpu cores	: 8
  apicid		: 14
  initial apicid	: 14
  fpu		: yes
  fpu_exception	: yes
  cpuid level	: 11
  wp		: yes
  flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 movbe popcnt tsc_deadline_timer aes rdrand lahf_lm 3dnowprefetch arat epb dtherm tpr_shadow vnmi flexpriority ept vpid tsc_adjust smep erms
  bogomips	: 4799.48
  clflush size	: 64
  cache_alignment	: 64
  address sizes	: 36 bits physical, 48 bits virtual
  power management:

  somehow reproducable the subjected error, and lxc is working still but
  not more managable until a reboot.

  managable means every command hangs.

  I saw there are alot of bugs but they seams to relate to older version
  and are closed, so i decided to file a new one?

  I run alot of machine with trusty an lxc containers but only these kind of machines produces these errors, all
  other don't show these odd behavior.

  thx in advance

  meno

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1403152/+subscriptions

References

[Bug 1403152] [NEW] unregister_netdevice: waiting for lo to become free. Usage count
From: menoabels, 2014-12-16