← Back to team overview

kernel-packages team mailing list archive

[Bug 1322407] Re: NFS kernel server creates a kworker with 100% CPU usage, then hangs randomly

 

OK -- bug can be reproduced. Here is an interesting observation: When a
*second* nfs server is connected to the network , the bug seems to be
suppressed, for what it's worth. This one really puzzles me. The runaway
kworker appeared only after I shut down the old server.

As for debugging, the component that seems to create an insane amount of
debug messages is rpcd. I am pasting snippets from nfsd's debugging-
enabled and rpcd's debugging-enabled here, please let me know if you
need more of the syslog file.

######### NFSD ##############
(...)
May 28 13:06:41 marcato kernel: [184158.621702] nfsd: fh_compose(exp 09:04/2 (filename redacted), ino=10619455)
May 28 13:06:41 marcato kernel: [184158.621714] nfsd: fh_compose(exp 09:04/2 (filename redacted), ino=10618649)
May 28 13:06:41 marcato kernel: [184158.621731] nfsd: fh_compose(exp 09:04/2 (filename redacted), ino=10619242)
May 28 13:06:41 marcato kernel: [184158.621742] nfsd: fh_compose(exp 09:04/2 (filename redacted), ino=10619422)
May 28 13:06:41 marcato kernel: [184158.621754] nfsd: fh_compose(exp 09:04/2 (filename redacted), ino=10619296)
May 28 13:06:41 marcato kernel: [184158.621761] nfsv4 compound op ffff880417a01080 opcnt 2 #2: 26: status 0
May 28 13:06:41 marcato kernel: [184158.621764] nfsv4 compound returned 0
May 28 13:06:41 marcato kernel: [184158.622472] nfsd_dispatch: vers 4 proc 1
May 28 13:06:41 marcato kernel: [184158.622484] nfsv4 compound op #1/2: 22 (OP_PUTFH)
May 28 13:06:41 marcato kernel: [184158.622491] nfsd: fh_verify(28: 01060001 57c17980 8845d969 d9acb495 eab15095 00a205da)
May 28 13:06:41 marcato kernel: [184158.622507] nfsv4 compound op ffff880417a01080 opcnt 2 #1: 22: status 0
May 28 13:06:41 marcato kernel: [184158.622511] nfsv4 compound op #2/2: 26 (OP_READDIR)
May 28 13:06:41 marcato kernel: [184158.622549] nfsd: fh_verify(28: 01060001 57c17980 8845d969 d9acb495 eab15095 00a205da)

######### RPCD ##############
(...)
May 28 13:07:29 marcato kernel: [184207.248774] RPC: 22256 call_connect_status (status -11)
May 28 13:07:29 marcato kernel: [184207.248777] svc: socket ffff880417284d00 TCP data ready (svsk ffff8803e3ff2000)
May 28 13:07:29 marcato kernel: [184207.248781] svc: transport ffff8803e3ff2000 served by daemon ffff8800ce6da000
May 28 13:07:29 marcato kernel: [184207.248791] RPC: 22256 call_bind (status 0)
May 28 13:07:29 marcato kernel: [184207.248793] svc: server ffff8800ce6da000, pool 0, transport ffff8803e3ff2000, inuse=2
May 28 13:07:29 marcato kernel: [184207.248795] svc: tcp_recv ffff8803e3ff2000 data 1 conn 0 close 0
May 28 13:07:29 marcato kernel: [184207.248800] svc: socket ffff8803e3ff2000 recvfrom(ffff8803e3ff22bc, 0) = 4
May 28 13:07:29 marcato kernel: [184207.248802] svc: TCP record, 188 bytes
May 28 13:07:29 marcato kernel: [184207.248806] svc: socket ffff8803e3ff2000 recvfrom(ffff880415a7c0bc, 3908) = 188
May 28 13:07:29 marcato kernel: [184207.248809] svc: TCP final record (188 bytes)
May 28 13:07:29 marcato kernel: [184207.248812] svc: got len=188
May 28 13:07:29 marcato kernel: [184207.248815] svc: svc_authenticate (1)
May 28 13:07:29 marcato kernel: [184207.248819] svc: calling dispatcher
May 28 13:07:29 marcato kernel: [184207.248826] RPC: 22256 call_connect xprt ffff8802fd03c000 is not connected
May 28 13:07:29 marcato kernel: [184207.248835] RPC: 22256 xprt_connect xprt ffff8802fd03c000 is not connected
May 28 13:07:29 marcato kernel: [184207.248844] RPC: 22256 sleep_on(queue "xprt_pending" time 4340888821)
May 28 13:07:29 marcato kernel: [184207.248851] RPC: 22256 added to queue ffff8802fd03c258 "xprt_pending"
May 28 13:07:29 marcato kernel: [184207.248862] RPC: 22256 setting alarm for 9000 ms
May 28 13:07:29 marcato kernel: [184207.248871] RPC:       xs_connect scheduled xprt ffff8802fd03c000
May 28 13:07:29 marcato kernel: [184207.248885] RPC:       disconnecting xprt ffff8802fd03c000 to reuse port
May 28 13:07:29 marcato kernel: [184207.248893] RPC:       AF_UNSPEC connect return code 0
May 28 13:07:29 marcato kernel: [184207.248902] RPC:       worker connecting xprt ffff8802fd03c000 via tcp to 0.0.0.0 (port 58162)
May 28 13:07:29 marcato kernel: [184207.248921] RPC:       xs_tcp_state_change client ffff8802fd03c000...
May 28 13:07:29 marcato kernel: [184207.248922] RPC:       state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
May 28 13:07:29 marcato kernel: [184207.248924] RPC:       disconnected transport ffff8802fd03c000
May 28 13:07:29 marcato kernel: [184207.248925] RPC: 22256 __rpc_wake_up_task (now 4340888822)
May 28 13:07:29 marcato kernel: [184207.248926] RPC: 22256 disabling timer
May 28 13:07:29 marcato kernel: [184207.248928] RPC: 22256 removed from queue ffff8802fd03c258 "xprt_pending"
May 28 13:07:29 marcato kernel: [184207.248929] RPC:       __rpc_wake_up_task done
May 28 13:07:29 marcato kernel: [184207.248931] RPC:       ffff8802fd03c000 connect status 115 connected 0 sock state 7
May 28 13:07:29 marcato kernel: [184207.248933] RPC: 22256 __rpc_execute flags=0x681
May 28 13:07:29 marcato kernel: [184207.248934] RPC: 22256 xprt_connect_status: retrying
May 28 13:07:29 marcato kernel: [184207.248936] RPC: 22256 call_connect_status (status -11)
May 28 13:07:29 marcato kernel: [184207.248937] RPC: 22256 call_bind (status 0)
May 28 13:07:29 marcato kernel: [184207.248939] RPC: 22256 call_connect xprt ffff8802fd03c000 is not connected
May 28 13:07:29 marcato kernel: [184207.248940] RPC: 22256 xprt_connect xprt ffff8802fd03c000 is not connected
May 28 13:07:29 marcato kernel: [184207.248942] RPC: 22256 sleep_on(queue "xprt_pending" time 4340888822)
May 28 13:07:29 marcato kernel: [184207.248943] RPC: 22256 added to queue ffff8802fd03c258 "xprt_pending"
May 28 13:07:29 marcato kernel: [184207.248944] RPC: 22256 setting alarm for 9000 ms
May 28 13:07:29 marcato kernel: [184207.248946] RPC:       xs_connect scheduled xprt ffff8802fd03c000
May 28 13:07:29 marcato kernel: [184207.248948] RPC:       disconnecting xprt ffff8802fd03c000 to reuse port
May 28 13:07:29 marcato kernel: [184207.248949] RPC:       AF_UNSPEC connect return code 0
May 28 13:07:29 marcato kernel: [184207.248951] RPC:       worker connecting xprt ffff8802fd03c000 via tcp to 0.0.0.0 (port 58162)
May 28 13:07:29 marcato kernel: [184207.248959] RPC:       xs_tcp_state_change client ffff8802fd03c000...
May 28 13:07:29 marcato kernel: [184207.248961] RPC:       state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
May 28 13:07:29 marcato kernel: [184207.248962] RPC:       disconnected transport ffff8802fd03c000
May 28 13:07:29 marcato kernel: [184207.248964] RPC: 22256 __rpc_wake_up_task (now 4340888822)
May 28 13:07:29 marcato kernel: [184207.248977] RPC: 22256 call_connect xprt ffff8802fd03c000 is not connected
May 28 13:07:29 marcato kernel: [184207.248979] RPC: 22256 xprt_connect xprt ffff8802fd03c000 is not connected
May 28 13:07:29 marcato kernel: [184207.248980] RPC: 22256 sleep_on(queue "xprt_pending" time 4340888822)
May 28 13:07:29 marcato kernel: [184207.248982] RPC: 22256 added to queue ffff8802fd03c258 "xprt_pending"
May 28 13:07:29 marcato kernel: [184207.248984] svc: socket ffff8803e3ff2000 sendto([ffff88011f47d000 64... ], 4092) = 4092 (addr 192.168.1.16, port=839)
May 28 13:07:29 marcato kernel: [184207.248986] svc: server ffff8800ce6da000 waiting for data (to = 900000)
May 28 13:07:29 marcato kernel: [184207.248987] RPC: 22256 setting alarm for 9000 ms
May 28 13:07:29 marcato kernel: [184207.248989] RPC:       xs_connect scheduled xprt ffff8802fd03c000
May 28 13:07:29 marcato kernel: [184207.248990] RPC:       disconnecting xprt ffff8802fd03c000 to reuse port
May 28 13:07:29 marcato kernel: [184207.248991] RPC:       AF_UNSPEC connect return code 0
May 28 13:07:29 marcato kernel: [184207.248993] RPC:       worker connecting xprt ffff8802fd03c000 via tcp to 0.0.0.0 (port 58162)
May 28 13:07:29 marcato kernel: [184207.249002] RPC:       xs_tcp_state_change client ffff8802fd03c000...
May 28 13:07:29 marcato kernel: [184207.249004] RPC:       state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
May 28 13:07:29 marcato kernel: [184207.249005] RPC:       disconnected transport ffff8802fd03c000
May 28 13:07:29 marcato kernel: [184207.249006] RPC: 22256 __rpc_wake_up_task (now 4340888822)
May 28 13:07:29 marcato kernel: [184207.249007] RPC: 22256 disabling timer
May 28 13:07:29 marcato kernel: [184207.249008] RPC: 22256 removed from queue ffff8802fd03c258 "xprt_pending"
May 28 13:07:29 marcato kernel: [184207.249009] RPC:       __rpc_wake_up_task done
May 28 13:07:29 marcato kernel: [184207.249011] RPC:       ffff8802fd03c000 connect status 115 connected 0 sock state 7
May 28 13:07:29 marcato kernel: [184207.249012] RPC: 22256 __rpc_execute flags=0x681
May 28 13:07:29 marcato kernel: [184207.249013] RPC: 22256 xprt_connect_status: retrying
May 28 13:07:29 marcato kernel: [184207.249014] RPC: 22256 call_connect_status (status -11)
May 28 13:07:29 marcato kernel: [184207.249016] RPC: 22256 call_bind (status 0)
May 28 13:07:29 marcato kernel: [184207.249017] RPC: 22256 call_connect xprt ffff8802fd03c000 is not connected
May 28 13:07:29 marcato kernel: [184207.249018] RPC: 22256 xprt_connect xprt ffff8802fd03c000 is not connected
May 28 13:07:29 marcato kernel: [184207.249019] RPC: 22256 sleep_on(queue "xprt_pending" time 4340888822)
May 28 13:07:29 marcato kernel: [184207.249020] RPC: 22256 added to queue ffff8802fd03c258 "xprt_pending"
May 28 13:07:29 marcato kernel: [184207.249021] RPC: 22256 setting alarm for 9000 ms
May 28 13:07:29 marcato kernel: [184207.249023] RPC:       xs_connect scheduled xprt ffff8802fd03c000
May 28 13:07:29 marcato kernel: [184207.249024] RPC:       disconnecting xprt ffff8802fd03c000 to reuse port
May 28 13:07:29 marcato kernel: [184207.249026] RPC:       AF_UNSPEC connect return code 0
May 28 13:07:29 marcato kernel: [184207.249027] RPC:       worker connecting xprt ffff8802fd03c000 via tcp to 0.0.0.0 (port 58162)
May 28 13:07:29 marcato kernel: [184207.249034] RPC:       xs_tcp_state_change client ffff8802fd03c000...
May 28 13:07:29 marcato kernel: [184207.249035] RPC:       state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
May 28 13:07:29 marcato kernel: [184207.249036] RPC:       disconnected transport ffff8802fd03c000
May 28 13:07:29 marcato kernel: [184207.249037] RPC: 22256 __rpc_wake_up_task (now 4340888822)
May 28 13:07:29 marcato kernel: [184207.249038] RPC: 22256 disabling timer
May 28 13:07:29 marcato kernel: [184207.249040] RPC: 22256 removed from queue ffff8802fd03c258 "xprt_pending"
May 28 13:07:29 marcato kernel: [184207.249041] RPC:       __rpc_wake_up_task done
May 28 13:07:29 marcato kernel: [184207.249042] RPC:       ffff8802fd03c000 connect status 115 connected 0 sock state 7
May 28 13:07:29 marcato kernel: [184207.249043] RPC: 22256 __rpc_execute flags=0x681
May 28 13:07:29 marcato kernel: [184207.249044] RPC: 22256 xprt_connect_status: retrying
May 28 13:07:29 marcato kernel: [184207.249046] RPC: 22256 call_connect_status (status -11)
May 28 13:07:29 marcato kernel: [184207.249047] RPC: 22256 call_bind (status 0)
May 28 13:07:29 marcato kernel: [184207.249048] RPC: 22256 call_connect xprt ffff8802fd03c000 is not connected
May 28 13:07:29 marcato kernel: [184207.249054] RPC: 22256 xprt_connect xprt ffff8802fd03c000 is not connected
May 28 13:07:29 marcato kernel: [184207.249062] RPC: 22256 sleep_on(queue "xprt_pending" time 4340888822)
May 28 13:07:29 marcato kernel: [184207.249070] RPC: 22256 added to queue ffff8802fd03c258 "xprt_pending"
May 28 13:07:29 marcato kernel: [184207.249078] RPC: 22256 setting alarm for 9000 ms
May 28 13:07:29 marcato kernel: [184207.249086] RPC:       xs_connect scheduled xprt ffff8802fd03c000
May 28 13:07:29 marcato kernel: [184207.249093] RPC:       disconnecting xprt ffff8802fd03c000 to reuse port
May 28 13:07:29 marcato kernel: [184207.249099] RPC:       AF_UNSPEC connect return code 0
May 28 13:07:29 marcato kernel: [184207.249106] RPC:       worker connecting xprt ffff8802fd03c000 via tcp to 0.0.0.0 (port 58162)
May 28 13:07:29 marcato kernel: [184207.249119] RPC:       xs_tcp_state_change client ffff8802fd03c000...
May 28 13:07:29 marcato kernel: [184207.249127] RPC:       state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
May 28 13:07:29 marcato kernel: [184207.249136] RPC:       disconnected transport ffff8802fd03c000
May 28 13:07:29 marcato kernel: [184207.249145] RPC: 22256 __rpc_wake_up_task (now 4340888822)
May 28 13:07:29 marcato kernel: [184207.249153] RPC: 22256 disabling timer
May 28 13:07:29 marcato kernel: [184207.249159] RPC: 22256 removed from queue ffff8802fd03c258 "xprt_pending"

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1322407

Title:
  NFS kernel server creates a kworker with 100% CPU usage, then hangs
  randomly

Status in “linux” package in Ubuntu:
  Confirmed

Bug description:
  This concerns the server edition of 14.04. I have set up a NFS server.
  Once I attach at least one client, one kworker process starts to use
  100% CPU. The runaway kworker returns to idle when I reset the NFS
  server daemon with "service nfs-kernel-server stop" followed by
  "service nfs-kernel-server start". With the nfs kernel server stopped,
  the CPU remains idle. With the nfs kernel server running, as soon as
  one client requests a connection, the kworker process jumps to 100%
  CPU again.

  After some random time, the nfs kernel server no longer accepts
  requests from clients. Restarting the service allows clients to
  reconnect. The syslog shows no relevant information.

  This problem has never appeared on a very similar server setup with
  12.04.

  Configration: The svcgssd is not running. I played with various
  configurations (enabling/disabling NFSv3 and NFSv4), but it makes no
  difference.

  I tried to enable event debugging:

  #> echo workqueue:workqueue_queue_work > /sys/kernel/debug/tracing/set_event
  #> cat /sys/kernel/debug/tracing/trace_pipe > /var/tmp/kerntrace.txt

  and found the following kernel trace in a tight loop:

  [...]
       kworker/2:1-86    [002] d... 161940.910668: workqueue_queue_work: work struct=ffff8804140675e0 function=xs_tcp_setup_socket [sunrpc] workqueue=ffff8800cfbb6a00 req_cpu=256 cpu=2
       kworker/2:1-86    [002] d.s. 161940.910674: workqueue_queue_work: work struct=ffff8800cef9f488 function=rpc_async_schedule [sunrpc] workqueue=ffff8800cfbb6a00 req_cpu=256 cpu=2
       kworker/2:1-86    [002] d... 161940.910675: workqueue_queue_work: work struct=ffff8804140675e0 function=xs_tcp_setup_socket [sunrpc] workqueue=ffff8800cfbb6a00 req_cpu=256 cpu=2
       kworker/2:1-86    [002] d.s. 161940.910681: workqueue_queue_work: work struct=ffff8800cef9f488 function=rpc_async_schedule [sunrpc] workqueue=ffff8800cfbb6a00 req_cpu=256 cpu=2
       kworker/2:1-86    [002] d... 161940.910682: workqueue_queue_work: work struct=ffff8804140675e0 function=xs_tcp_setup_socket [sunrpc] workqueue=ffff8800cfbb6a00 req_cpu=256 cpu=2
       kworker/2:1-86    [002] d.s. 161940.910688: workqueue_queue_work: work struct=ffff8800cef9f488 function=rpc_async_schedule [sunrpc] workqueue=ffff8800cfbb6a00 req_cpu=256 cpu=2
       kworker/2:1-86    [002] d... 161940.910689: workqueue_queue_work: work struct=ffff8804140675e0 function=xs_tcp_setup_socket [sunrpc] workqueue=ffff8800cfbb6a00 req_cpu=256 cpu=2
       kworker/2:1-86    [002] d.s. 161940.910695: workqueue_queue_work: work struct=ffff8800cef9f488 function=rpc_async_schedule [sunrpc] workqueue=ffff8800cfbb6a00 req_cpu=256 cpu=2
       kworker/2:1-86    [002] d... 161940.910696: workqueue_queue_work: work struct=ffff8804140675e0 function=xs_tcp_setup_socket [sunrpc] workqueue=ffff8800cfbb6a00 req_cpu=256 cpu=2
       kworker/2:1-86    [002] d.s. 161940.910702: workqueue_queue_work: work struct=ffff8800cef9f488 function=rpc_async_schedule [sunrpc] workqueue=ffff8800cfbb6a00 req_cpu=256 cpu=2
  [...]

  At present, I have to consider NFS fubar.

  Thanks,
  Mark

  ProblemType: Bug
  DistroRelease: Ubuntu 14.04
  Package: linux-image-3.13.0-24-generic 3.13.0-24.47
  ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9
  Uname: Linux 3.13.0-24-generic x86_64
  AlsaVersion: Advanced Linux Sound Architecture Driver Version k3.13.0-24-generic.
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.14.1-0ubuntu3.1
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/controlC0', '/dev/snd/hwC0D0', '/dev/snd/pcmC0D0c', '/dev/snd/pcmC0D0p', '/dev/snd/pcmC0D1p', '/dev/snd/pcmC0D2c', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
  CRDA: Error: [Errno 2] No such file or directory: 'iw'
  Card0.Amixer.info: Error: [Errno 2] No such file or directory: 'amixer'
  Card0.Amixer.values: Error: [Errno 2] No such file or directory: 'amixer'
  Date: Thu May 22 22:48:48 2014
  HibernationDevice: RESUME=UUID=adcdeef5-9b46-4ea4-b9f4-b6e642ea91e8
  InstallationDate: Installed on 2014-05-12 (10 days ago)
  InstallationMedia: Ubuntu-Server 14.04 LTS "Trusty Tahr" - Release amd64 (20140416.2)
  IwConfig:
   eth0      no wireless extensions.

   lo        no wireless extensions.
  MachineType: Gigabyte Technology Co., Ltd. GA-990XA-UD3
  ProcEnviron:
   TERM=xterm
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=<set>
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 nouveaufb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=6cf411b3-3e8e-41e5-8670-db14ad259f58 ro IOMMU=soft nomdmonddf nomdmonisw
  RelatedPackageVersions:
   linux-restricted-modules-3.13.0-24-generic N/A
   linux-backports-modules-3.13.0-24-generic  N/A
   linux-firmware                             1.127.2
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)
  WifiSyslog:

  dmi.bios.date: 10/13/2011
  dmi.bios.vendor: Award Software International, Inc.
  dmi.bios.version: F9
  dmi.board.name: GA-990XA-UD3
  dmi.board.vendor: Gigabyte Technology Co., Ltd.
  dmi.board.version: x.x
  dmi.chassis.type: 3
  dmi.chassis.vendor: Gigabyte Technology Co., Ltd.
  dmi.modalias: dmi:bvnAwardSoftwareInternational,Inc.:bvrF9:bd10/13/2011:svnGigabyteTechnologyCo.,Ltd.:pnGA-990XA-UD3:pvr:rvnGigabyteTechnologyCo.,Ltd.:rnGA-990XA-UD3:rvrx.x:cvnGigabyteTechnologyCo.,Ltd.:ct3:cvr:
  dmi.product.name: GA-990XA-UD3
  dmi.sys.vendor: Gigabyte Technology Co., Ltd.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1322407/+subscriptions


References