kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #30085
[Bug 1046285] Re: NFS client hang with lots of simultaneous operations
Mark Thompson, this bug report is being closed due to your last comment
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1046285/comments/21
regarding this being fixed with an update. For future reference you can
manage the status of your own bugs by clicking on the current status in
the yellow line and then choosing a new status in the revealed drop down
box. You can learn more about bug statuses at
https://wiki.ubuntu.com/Bugs/Status. Thank you again for taking the time
to report this bug and helping to make Ubuntu better. Please submit any
future bugs you may find.
** Changed in: linux (Ubuntu)
Status: Incomplete => Invalid
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1046285
Title:
NFS client hang with lots of simultaneous operations
Status in “linux” package in Ubuntu:
Invalid
Bug description:
When lots of simultaneous NFS operations from different processes are
happening, sometimes all of the processes get stuck in kernel space
(uninterruptible sleep) and make no forward progress. The network
connection is not the problem (the NFS server is still talkable to by
other means - ping, ssh). This has happened to me five times in the
last few weeks (four times randomly and once when trying to reproduce
it), since upgrading to 12.04 (it never happened here on 10.04).
The processes which are stuck can be killed with SIGKILL, but to any
normal means are totally unresponsive. Attmpting to talk to the NFS
mount from a nonstuck process will immediately get that one stuck as
well. The problem can be "fixed" by sending SIGKILL to all stuck
processes (being careful not to create any more - if one of the stuck
processes was running from a binary on the NFS mount then ps can hang
too as it tries to stat it) and unmounting the filesystem. After
remounting, everything works as expected again. With an NFS home
directory (my random failure case), this basically means that that one
user is totally stuck (any access to their home directory hangs the
process which does it) and has to have all their processes killed by
root to bring the machine back to a working state.
The kernel log doesn't mention anything at all, but magic sysrq 'w'
was able to extract stack traces of all the blocked processes (see
attached) - they are all stuck in NFS-related RPC calls.
In general it has happened while building a large source tree on an
NFS mount, with many forked processes all competing to talk to the
filesystem at the same time. There are no special mount options -
it's just a vanilla v3 NFS mount with 'rw' set. I was able to
reproduce it once by this method - it happened after leaving six
eight-way-forked builds (repeatedly cleaning and building their tree)
going for several hours (none of the random failures had use anything
like this heavy at the time, though, as far as I can tell).
This problem looks similar: http://www.spinics.net/lists/linux-nfs/msg32318.html . However, I don't know enough about the NFS internals to say that it is the same. The script suggested there to reproduce that problem does not fail for me in an hour of running.
---
AcpiTables: Error: command ['sudo', 'LC_MESSAGES=C', 'LANGUAGE=', '/usr/share/apport/dump_acpi_tables.py'] failed with exit code 1: mrt is not in the sudoers file. This incident will be reported.
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.24.
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.0.1-0ubuntu12
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/controlC2', '/dev/snd/pcmC2D0c', '/dev/snd/by-id', '/dev/snd/controlC0', '/dev/snd/pcmC0D0c', '/dev/snd/pcmC0D0p', '/dev/snd/by-path', '/dev/snd/controlC1', '/dev/snd/hwC1D0', '/dev/snd/hwC1D3', '/dev/snd/pcmC1D0c', '/dev/snd/pcmC1D0p', '/dev/snd/pcmC1D1p', '/dev/snd/pcmC1D2c', '/dev/snd/pcmC1D3p', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info: Error: [Errno 2] No such file or directory
Card0.Amixer.values: Error: [Errno 2] No such file or directory
Card1.Amixer.info: Error: [Errno 2] No such file or directory
Card1.Amixer.values: Error: [Errno 2] No such file or directory
Card2.Amixer.info: Error: [Errno 2] No such file or directory
Card2.Amixer.values: Error: [Errno 2] No such file or directory
CurrentDmesg:
Error: command ['sh', '-c', 'dmesg | comm -13 --nocheck-order /var/log/dmesg -'] failed with exit code 1: comm: /var/log/dmesg: Permission denied
dmesg: write failed: Broken pipe
DistroRelease: Ubuntu 12.04
IwConfig: Error: [Errno 2] No such file or directory
MachineType: Dell Inc. Studio XPS 8100
Package: linux (not installed)
ProcEnviron:
SHELL=/bin/bash
TERM=xterm
PATH=(custom, no user)
LANG=en_GB.UTF-8
ProcFB: 0 nouveaufb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.2.0-29-generic root=UUID=6dbbcb79-2948-4430-8d25-5479f8831106 ro
ProcVersionSignature: Ubuntu 3.2.0-29.46-generic 3.2.24
RfKill: Error: [Errno 2] No such file or directory
Tags: precise
Uname: Linux 3.2.0-29-generic x86_64
UpgradeStatus: Upgraded to precise on 2012-06-15 (82 days ago)
UserGroups: dialout video
WifiSyslog:
dmi.bios.date: 12/09/2009
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A03
dmi.board.name: 0T568R
dmi.board.vendor: Dell Inc.
dmi.board.version: A00
dmi.chassis.type: 3
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvrA03:bd12/09/2009:svnDellInc.:pnStudioXPS8100:pvr:rvnDellInc.:rn0T568R:rvrA00:cvnDellInc.:ct3:cvr:
dmi.product.name: Studio XPS 8100
dmi.sys.vendor: Dell Inc.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1046285/+subscriptions