← Back to team overview

kernel-packages team mailing list archive

[Bug 1322407] Re: NFS kernel server creates a kworker with 100% CPU usage, then hangs randomly

 

Probably the relevant part I missed initially is that probably this involves connections going on and succeeding for a bit and fail at some point. Where it is unclear how many connections have been going on and so on.
The function tracing you did does show that there is some kind of loop going on but does not allow to figure out any real details. I wonder whether you could try to enable some of the nfs debugging from /proc/sys/sunrpc/*_debug. Which allows to enable various pieces of internal debugging.

#define RPCDBG_XPRT             0x0001
#define RPCDBG_CALL             0x0002
#define RPCDBG_DEBUG            0x0004
#define RPCDBG_NFS              0x0008
#define RPCDBG_AUTH             0x0010
#define RPCDBG_BIND             0x0020
#define RPCDBG_SCHED            0x0040
#define RPCDBG_TRANS            0x0080
#define RPCDBG_SVCXPRT          0x0100
#define RPCDBG_SVCDSP           0x0200
#define RPCDBG_MISC             0x0400
#define RPCDBG_CACHE            0x0800
#define RPCDBG_ALL              0x7fff

So echoing 524287 into the various /proc interfaces should enable all
debugging. Not sure which if them maybe starting with nfsd_debug and/or
rpc_debug. Maybe this allows to narrow down what goes wrong in a better
way.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1322407

Title:
  NFS kernel server creates a kworker with 100% CPU usage, then hangs
  randomly

Status in “linux” package in Ubuntu:
  Confirmed

Bug description:
  This concerns the server edition of 14.04. I have set up a NFS server.
  Once I attach at least one client, one kworker process starts to use
  100% CPU. The runaway kworker returns to idle when I reset the NFS
  server daemon with "service nfs-kernel-server stop" followed by
  "service nfs-kernel-server start". With the nfs kernel server stopped,
  the CPU remains idle. With the nfs kernel server running, as soon as
  one client requests a connection, the kworker process jumps to 100%
  CPU again.

  After some random time, the nfs kernel server no longer accepts
  requests from clients. Restarting the service allows clients to
  reconnect. The syslog shows no relevant information.

  This problem has never appeared on a very similar server setup with
  12.04.

  Configration: The svcgssd is not running. I played with various
  configurations (enabling/disabling NFSv3 and NFSv4), but it makes no
  difference.

  I tried to enable event debugging:

  #> echo workqueue:workqueue_queue_work > /sys/kernel/debug/tracing/set_event
  #> cat /sys/kernel/debug/tracing/trace_pipe > /var/tmp/kerntrace.txt

  and found the following kernel trace in a tight loop:

  [...]
       kworker/2:1-86    [002] d... 161940.910668: workqueue_queue_work: work struct=ffff8804140675e0 function=xs_tcp_setup_socket [sunrpc] workqueue=ffff8800cfbb6a00 req_cpu=256 cpu=2
       kworker/2:1-86    [002] d.s. 161940.910674: workqueue_queue_work: work struct=ffff8800cef9f488 function=rpc_async_schedule [sunrpc] workqueue=ffff8800cfbb6a00 req_cpu=256 cpu=2
       kworker/2:1-86    [002] d... 161940.910675: workqueue_queue_work: work struct=ffff8804140675e0 function=xs_tcp_setup_socket [sunrpc] workqueue=ffff8800cfbb6a00 req_cpu=256 cpu=2
       kworker/2:1-86    [002] d.s. 161940.910681: workqueue_queue_work: work struct=ffff8800cef9f488 function=rpc_async_schedule [sunrpc] workqueue=ffff8800cfbb6a00 req_cpu=256 cpu=2
       kworker/2:1-86    [002] d... 161940.910682: workqueue_queue_work: work struct=ffff8804140675e0 function=xs_tcp_setup_socket [sunrpc] workqueue=ffff8800cfbb6a00 req_cpu=256 cpu=2
       kworker/2:1-86    [002] d.s. 161940.910688: workqueue_queue_work: work struct=ffff8800cef9f488 function=rpc_async_schedule [sunrpc] workqueue=ffff8800cfbb6a00 req_cpu=256 cpu=2
       kworker/2:1-86    [002] d... 161940.910689: workqueue_queue_work: work struct=ffff8804140675e0 function=xs_tcp_setup_socket [sunrpc] workqueue=ffff8800cfbb6a00 req_cpu=256 cpu=2
       kworker/2:1-86    [002] d.s. 161940.910695: workqueue_queue_work: work struct=ffff8800cef9f488 function=rpc_async_schedule [sunrpc] workqueue=ffff8800cfbb6a00 req_cpu=256 cpu=2
       kworker/2:1-86    [002] d... 161940.910696: workqueue_queue_work: work struct=ffff8804140675e0 function=xs_tcp_setup_socket [sunrpc] workqueue=ffff8800cfbb6a00 req_cpu=256 cpu=2
       kworker/2:1-86    [002] d.s. 161940.910702: workqueue_queue_work: work struct=ffff8800cef9f488 function=rpc_async_schedule [sunrpc] workqueue=ffff8800cfbb6a00 req_cpu=256 cpu=2
  [...]

  At present, I have to consider NFS fubar.

  Thanks,
  Mark

  ProblemType: Bug
  DistroRelease: Ubuntu 14.04
  Package: linux-image-3.13.0-24-generic 3.13.0-24.47
  ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9
  Uname: Linux 3.13.0-24-generic x86_64
  AlsaVersion: Advanced Linux Sound Architecture Driver Version k3.13.0-24-generic.
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.14.1-0ubuntu3.1
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/controlC0', '/dev/snd/hwC0D0', '/dev/snd/pcmC0D0c', '/dev/snd/pcmC0D0p', '/dev/snd/pcmC0D1p', '/dev/snd/pcmC0D2c', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
  CRDA: Error: [Errno 2] No such file or directory: 'iw'
  Card0.Amixer.info: Error: [Errno 2] No such file or directory: 'amixer'
  Card0.Amixer.values: Error: [Errno 2] No such file or directory: 'amixer'
  Date: Thu May 22 22:48:48 2014
  HibernationDevice: RESUME=UUID=adcdeef5-9b46-4ea4-b9f4-b6e642ea91e8
  InstallationDate: Installed on 2014-05-12 (10 days ago)
  InstallationMedia: Ubuntu-Server 14.04 LTS "Trusty Tahr" - Release amd64 (20140416.2)
  IwConfig:
   eth0      no wireless extensions.

   lo        no wireless extensions.
  MachineType: Gigabyte Technology Co., Ltd. GA-990XA-UD3
  ProcEnviron:
   TERM=xterm
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=<set>
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 nouveaufb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=6cf411b3-3e8e-41e5-8670-db14ad259f58 ro IOMMU=soft nomdmonddf nomdmonisw
  RelatedPackageVersions:
   linux-restricted-modules-3.13.0-24-generic N/A
   linux-backports-modules-3.13.0-24-generic  N/A
   linux-firmware                             1.127.2
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)
  WifiSyslog:

  dmi.bios.date: 10/13/2011
  dmi.bios.vendor: Award Software International, Inc.
  dmi.bios.version: F9
  dmi.board.name: GA-990XA-UD3
  dmi.board.vendor: Gigabyte Technology Co., Ltd.
  dmi.board.version: x.x
  dmi.chassis.type: 3
  dmi.chassis.vendor: Gigabyte Technology Co., Ltd.
  dmi.modalias: dmi:bvnAwardSoftwareInternational,Inc.:bvrF9:bd10/13/2011:svnGigabyteTechnologyCo.,Ltd.:pnGA-990XA-UD3:pvr:rvnGigabyteTechnologyCo.,Ltd.:rnGA-990XA-UD3:rvrx.x:cvnGigabyteTechnologyCo.,Ltd.:ct3:cvr:
  dmi.product.name: GA-990XA-UD3
  dmi.sys.vendor: Gigabyte Technology Co., Ltd.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1322407/+subscriptions


References