kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #151518
[Bug 1403152] Re: unregister_netdevice: waiting for lo to become free. Usage count
We also backported [1] to 4.2 (linux-lts-wily) and deployed it to our
production OpenStack cloud. We just installed it yesterday and our MTBF
is between two and twenty days, so we won't know if this has made any
difference for a while now.
Some details about our configuration / failure mode:
Three OpenStack "Layer 3" hosts (running 3.19.0-30-generic
#34~14.04.1-Ubuntu) providing virtual routers/VPNs/Metadata via network
namespaces.
Our most recent failures occurred on hosts B and C (within 30 minutes of
each other, after having been fine for weeks) while removing routers
from A and re-creating them on B and C.
Our stack traces are a slightly different from the ones posted above...
Dec 14 15:37:05 hostname kernel: [961050.119727] INFO: task ip:9865 blocked for more than 120 seconds.
Dec 14 15:37:05 hostname kernel: [961050.126707] Tainted: G C 3.19.0-30-generic #34~14.04.1-Ubuntu
Dec 14 15:37:05 hostname kernel: [961050.135073] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 14 15:37:05 hostname kernel: [961050.144094] ip D ffff88097e3e3de8 0 9865 9864 0x00000000
Dec 14 15:37:05 hostname kernel: [961050.144098] ffff88097e3e3de8 ffff880e982693a0 0000000000013e80 ffff88097e3e3fd8
Dec 14 15:37:05 hostname kernel: [961050.144100] 0000000000013e80 ffff88101a8993a0 ffff880e982693a0 0000000000000000
Dec 14 15:37:05 hostname kernel: [961050.144102] ffffffff81cdb2a0 ffffffff81cdb2a4 ffff880e982693a0 00000000ffffffff
Dec 14 15:37:05 hostname kernel: [961050.144104] Call Trace:
Dec 14 15:37:05 hostname kernel: [961050.144109] [<ffffffff817b2fa9>] schedule_preempt_disabled+0x29/0x70
Dec 14 15:37:05 hostname kernel: [961050.144111] [<ffffffff817b4c95>] __mutex_lock_slowpath+0x95/0x100
Dec 14 15:37:05 hostname kernel: [961050.144115] [<ffffffff811cfd66>] ? __kmalloc+0x226/0x280
Dec 14 15:37:05 hostname kernel: [961050.144117] [<ffffffff816a14a1>] ? net_alloc_generic+0x21/0x30
Dec 14 15:37:05 hostname kernel: [961050.144120] [<ffffffff817b4d23>] mutex_lock+0x23/0x37
Dec 14 15:37:05 hostname kernel: [961050.144122] [<ffffffff816a1c75>] copy_net_ns+0x75/0x150
Dec 14 15:37:05 hostname kernel: [961050.144125] [<ffffffff810943ad>] create_new_namespaces+0xfd/0x180
Dec 14 15:37:05 hostname kernel: [961050.144127] [<ffffffff810945ba>] unshare_nsproxy_namespaces+0x5a/0xc0
Dec 14 15:37:05 hostname kernel: [961050.144130] [<ffffffff8107439b>] SyS_unshare+0x15b/0x2e0
Dec 14 15:37:05 hostname kernel: [961050.144133] [<ffffffff817b6e4d>] system_call_fastpath+0x16/0x1b
Dec 14 15:37:05 hostname kernel: [961050.144135] INFO: task ip:9896 blocked for more than 120 seconds.
Dec 14 15:37:05 hostname kernel: [961050.151109] Tainted: G C 3.19.0-30-generic #34~14.04.1-Ubuntu
Dec 14 15:37:05 hostname kernel: [961050.159558] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 14 15:37:05 hostname kernel: [961050.168551] ip D ffff8804591cfde8 0 9896 9895 0x00000000
Dec 14 15:37:05 hostname kernel: [961050.168556] ffff8804591cfde8 ffff880814031d70 0000000000013e80 ffff8804591cffd8
Dec 14 15:37:05 hostname kernel: [961050.168558] 0000000000013e80 ffffffff81c1d4e0 ffff880814031d70 0000000000000000
Dec 14 15:37:05 hostname kernel: [961050.168560] ffffffff81cdb2a0 ffffffff81cdb2a4 ffff880814031d70 00000000ffffffff
Dec 14 15:37:05 hostname kernel: [961050.168562] Call Trace:
Dec 14 15:37:05 hostname kernel: [961050.168568] [<ffffffff817b2fa9>] schedule_preempt_disabled+0x29/0x70
Dec 14 15:37:05 hostname kernel: [961050.168571] [<ffffffff817b4c95>] __mutex_lock_slowpath+0x95/0x100
Dec 14 15:37:05 hostname kernel: [961050.168573] [<ffffffff817b4d23>] mutex_lock+0x23/0x37
Dec 14 15:37:05 hostname kernel: [961050.168577] [<ffffffff816a1c75>] copy_net_ns+0x75/0x150
Dec 14 15:37:05 hostname kernel: [961050.168581] [<ffffffff810943ad>] create_new_namespaces+0xfd/0x180
Dec 14 15:37:05 hostname kernel: [961050.168584] [<ffffffff810945ba>] unshare_nsproxy_namespaces+0x5a/0xc0
Dec 14 15:37:05 hostname kernel: [961050.168587] [<ffffffff8107439b>] SyS_unshare+0x15b/0x2e0
Dec 14 15:37:05 hostname kernel: [961050.168589] [<ffffffff817b6e4d>] system_call_fastpath+0x16/0x1b
[1] http://www.spinics.net/lists/netdev/msg351337.html
Cheers,
James Dempsey
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-lts-utopic in Ubuntu.
https://bugs.launchpad.net/bugs/1403152
Title:
unregister_netdevice: waiting for lo to become free. Usage count
Status in Linux:
Unknown
Status in linux package in Ubuntu:
Fix Released
Status in linux-lts-utopic package in Ubuntu:
Confirmed
Status in linux source package in Trusty:
Fix Released
Status in linux-lts-utopic source package in Trusty:
Fix Released
Status in linux source package in Vivid:
Fix Released
Bug description:
SRU Justification:
[Impact]
Users of kernels that utilize NFS may see the following messages when
shutting down and starting containers:
unregister_netdevice: waiting for lo to become free. Usage count =
1
This can cause issues when trying to create net network namespace and
thus block a user from creating new containers.
[Test Case]
Setup multiple containers in parallel to mount and NFS share, create
some traffic and shutdown. Eventually you will see the kernel message.
Dave's script here:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1403152/comments/24
[Fix]
commit de84d89030fa4efa44c02c96c8b4a8176042c4ff upstream
--
I currently running trusty latest patches and i get on these hardware
and software:
Ubuntu 3.13.0-43.72-generic 3.13.11.11
processor : 7
vendor_id : GenuineIntel
cpu family : 6
model : 77
model name : Intel(R) Atom(TM) CPU C2758 @ 2.40GHz
stepping : 8
microcode : 0x11d
cpu MHz : 2400.000
cache size : 1024 KB
physical id : 0
siblings : 8
core id : 7
cpu cores : 8
apicid : 14
initial apicid : 14
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 movbe popcnt tsc_deadline_timer aes rdrand lahf_lm 3dnowprefetch arat epb dtherm tpr_shadow vnmi flexpriority ept vpid tsc_adjust smep erms
bogomips : 4799.48
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
somehow reproducable the subjected error, and lxc is working still but
not more managable until a reboot.
managable means every command hangs.
I saw there are alot of bugs but they seams to relate to older version
and are closed, so i decided to file a new one?
I run alot of machine with trusty an lxc containers but only these kind of machines produces these errors, all
other don't show these odd behavior.
thx in advance
meno
To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1403152/+subscriptions
References