kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #64310
[Bug 1322407] Re: NFS kernel server creates a kworker with 100% CPU usage, then hangs randomly
Deeper inspection of the logs looks like the problem is some connection
attempt when xprt is not connected. Part of that procedure is to re-use
the connection which forces the xprt to disconnect (so the socket can be
re-used). This triggers a state change (TCP_CLOSE) and wakes up the task
waiting for the connection. But the connection state then in INPROGRESS
which somehow gets translated into EGAIN and that triggers call_bind
which repeats the re-use of socket process.
With that lead, I found two commits upstream referring to this commit
that introduces that behaviour:
* 561ec1603171 (SUNRPC: call_connect_status should recheck bind..)
The two fixes related to that are:
* 1fa3e2e SUNRPC: Ensure call_connect_status() deals correctly with SOFTCONN tasks
* 485f225 SUNRPC: Ensure that call_connect times out correctly
The latter would at least cause timeouts to be re-adjusted before looping back into call_bind. So it might be worth trying those. I build a trusty kernel with those two patches added. The debs are at http://people.canonical.com/~smb/lp1322407/
Could you install those on the server side and see whether this helps with the problem?
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1322407
Title:
NFS kernel server creates a kworker with 100% CPU usage, then hangs
randomly
Status in “linux” package in Ubuntu:
Confirmed
Bug description:
This concerns the server edition of 14.04. I have set up a NFS server.
Once I attach at least one client, one kworker process starts to use
100% CPU. The runaway kworker returns to idle when I reset the NFS
server daemon with "service nfs-kernel-server stop" followed by
"service nfs-kernel-server start". With the nfs kernel server stopped,
the CPU remains idle. With the nfs kernel server running, as soon as
one client requests a connection, the kworker process jumps to 100%
CPU again.
After some random time, the nfs kernel server no longer accepts
requests from clients. Restarting the service allows clients to
reconnect. The syslog shows no relevant information.
This problem has never appeared on a very similar server setup with
12.04.
Configration: The svcgssd is not running. I played with various
configurations (enabling/disabling NFSv3 and NFSv4), but it makes no
difference.
I tried to enable event debugging:
#> echo workqueue:workqueue_queue_work > /sys/kernel/debug/tracing/set_event
#> cat /sys/kernel/debug/tracing/trace_pipe > /var/tmp/kerntrace.txt
and found the following kernel trace in a tight loop:
[...]
kworker/2:1-86 [002] d... 161940.910668: workqueue_queue_work: work struct=ffff8804140675e0 function=xs_tcp_setup_socket [sunrpc] workqueue=ffff8800cfbb6a00 req_cpu=256 cpu=2
kworker/2:1-86 [002] d.s. 161940.910674: workqueue_queue_work: work struct=ffff8800cef9f488 function=rpc_async_schedule [sunrpc] workqueue=ffff8800cfbb6a00 req_cpu=256 cpu=2
kworker/2:1-86 [002] d... 161940.910675: workqueue_queue_work: work struct=ffff8804140675e0 function=xs_tcp_setup_socket [sunrpc] workqueue=ffff8800cfbb6a00 req_cpu=256 cpu=2
kworker/2:1-86 [002] d.s. 161940.910681: workqueue_queue_work: work struct=ffff8800cef9f488 function=rpc_async_schedule [sunrpc] workqueue=ffff8800cfbb6a00 req_cpu=256 cpu=2
kworker/2:1-86 [002] d... 161940.910682: workqueue_queue_work: work struct=ffff8804140675e0 function=xs_tcp_setup_socket [sunrpc] workqueue=ffff8800cfbb6a00 req_cpu=256 cpu=2
kworker/2:1-86 [002] d.s. 161940.910688: workqueue_queue_work: work struct=ffff8800cef9f488 function=rpc_async_schedule [sunrpc] workqueue=ffff8800cfbb6a00 req_cpu=256 cpu=2
kworker/2:1-86 [002] d... 161940.910689: workqueue_queue_work: work struct=ffff8804140675e0 function=xs_tcp_setup_socket [sunrpc] workqueue=ffff8800cfbb6a00 req_cpu=256 cpu=2
kworker/2:1-86 [002] d.s. 161940.910695: workqueue_queue_work: work struct=ffff8800cef9f488 function=rpc_async_schedule [sunrpc] workqueue=ffff8800cfbb6a00 req_cpu=256 cpu=2
kworker/2:1-86 [002] d... 161940.910696: workqueue_queue_work: work struct=ffff8804140675e0 function=xs_tcp_setup_socket [sunrpc] workqueue=ffff8800cfbb6a00 req_cpu=256 cpu=2
kworker/2:1-86 [002] d.s. 161940.910702: workqueue_queue_work: work struct=ffff8800cef9f488 function=rpc_async_schedule [sunrpc] workqueue=ffff8800cfbb6a00 req_cpu=256 cpu=2
[...]
At present, I have to consider NFS fubar.
Thanks,
Mark
ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: linux-image-3.13.0-24-generic 3.13.0-24.47
ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9
Uname: Linux 3.13.0-24-generic x86_64
AlsaVersion: Advanced Linux Sound Architecture Driver Version k3.13.0-24-generic.
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.14.1-0ubuntu3.1
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/controlC0', '/dev/snd/hwC0D0', '/dev/snd/pcmC0D0c', '/dev/snd/pcmC0D0p', '/dev/snd/pcmC0D1p', '/dev/snd/pcmC0D2c', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory: 'iw'
Card0.Amixer.info: Error: [Errno 2] No such file or directory: 'amixer'
Card0.Amixer.values: Error: [Errno 2] No such file or directory: 'amixer'
Date: Thu May 22 22:48:48 2014
HibernationDevice: RESUME=UUID=adcdeef5-9b46-4ea4-b9f4-b6e642ea91e8
InstallationDate: Installed on 2014-05-12 (10 days ago)
InstallationMedia: Ubuntu-Server 14.04 LTS "Trusty Tahr" - Release amd64 (20140416.2)
IwConfig:
eth0 no wireless extensions.
lo no wireless extensions.
MachineType: Gigabyte Technology Co., Ltd. GA-990XA-UD3
ProcEnviron:
TERM=xterm
PATH=(custom, no user)
XDG_RUNTIME_DIR=<set>
LANG=en_US.UTF-8
SHELL=/bin/bash
ProcFB: 0 nouveaufb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=6cf411b3-3e8e-41e5-8670-db14ad259f58 ro IOMMU=soft nomdmonddf nomdmonisw
RelatedPackageVersions:
linux-restricted-modules-3.13.0-24-generic N/A
linux-backports-modules-3.13.0-24-generic N/A
linux-firmware 1.127.2
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
WifiSyslog:
dmi.bios.date: 10/13/2011
dmi.bios.vendor: Award Software International, Inc.
dmi.bios.version: F9
dmi.board.name: GA-990XA-UD3
dmi.board.vendor: Gigabyte Technology Co., Ltd.
dmi.board.version: x.x
dmi.chassis.type: 3
dmi.chassis.vendor: Gigabyte Technology Co., Ltd.
dmi.modalias: dmi:bvnAwardSoftwareInternational,Inc.:bvrF9:bd10/13/2011:svnGigabyteTechnologyCo.,Ltd.:pnGA-990XA-UD3:pvr:rvnGigabyteTechnologyCo.,Ltd.:rnGA-990XA-UD3:rvrx.x:cvnGigabyteTechnologyCo.,Ltd.:ct3:cvr:
dmi.product.name: GA-990XA-UD3
dmi.sys.vendor: Gigabyte Technology Co., Ltd.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1322407/+subscriptions
References