kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #08290
[Bug 872398] Re: NFSv4 client doesn't handle NFS4ERR_EXPIRED properly, read() never returns
** Changed in: linux (Ubuntu)
Status: Incomplete => Invalid
** Package changed: linux (CentOS) => linux
** Changed in: linux
Status: New => Invalid
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/872398
Title:
NFSv4 client doesn't handle NFS4ERR_EXPIRED properly, read() never
returns
Status in The Linux Kernel:
Invalid
Status in “linux” package in Ubuntu:
Invalid
Bug description:
This is Ubuntu 2.6.38-10.46~lucid1-generic 2.6.38.7 x86_64.
When attempting to read a file over NFSv4, sometimes the application
stops in the first read() after the open(). sysrq-t shows something
like this:
[ 6252.910004] cat D ffff8800454383b0 0 2886 8323 0x00000000
[ 6252.910004] ffff88004e343b98 0000000000000086 ffff88004e343ae8 ffffffffa034b4e8
[ 6252.910004] 0000000000013cc0 ffff880045438000 ffff8800454383b0 ffff88004e343fd8
[ 6252.910004] ffff8800454383b8 0000000000013cc0 ffff88004e342010 0000000000013cc0
[ 6252.910004] Call Trace:
[ 6252.910004] [<ffffffff8110b330>] ? sync_page_killable+0x0/0x40
[ 6252.910004] [<ffffffff815be4b0>] io_schedule+0x70/0xc0
[ 6252.910004] [<ffffffff8110b315>] sync_page+0x45/0x60
[ 6252.910004] [<ffffffff8110b33e>] sync_page_killable+0xe/0x40
[ 6252.910004] [<ffffffff815bec2a>] __wait_on_bit_lock+0x5a/0xc0
[ 6252.910004] [<ffffffff8110b237>] __lock_page_killable+0x67/0x70
[ 6252.910004] [<ffffffff81087230>] ? wake_bit_function+0x0/0x40
[ 6252.910004] [<ffffffff8110c07e>] ? find_get_page+0x1e/0xa0
[ 6252.910004] [<ffffffff8110ce2e>] T.900+0x2be/0x450
[ 6252.910004] [<ffffffff8110d0a8>] generic_file_aio_read+0xe8/0x230
[ 6252.910004] [<ffffffffa03be5c9>] nfs_file_read+0xa9/0x110 [nfs]
[ 6252.910004] [<ffffffff8116363a>] do_sync_read+0xda/0x120
[ 6252.910004] [<ffffffff81131568>] ? handle_mm_fault+0x148/0x270
[ 6252.910004] [<ffffffff8127a78b>] ? security_file_permission+0x8b/0x90
[ 6252.910004] [<ffffffff81163d75>] vfs_read+0xc5/0x190
[ 6252.910004] [<ffffffff81163f41>] sys_read+0x51/0x90
[ 6252.910004] [<ffffffff8100c082>] system_call_fastpath+0x16/0x1b
When the system is in that situation any access to the file hangs in a
similar way. Here is an attempt to execve() the file:
[ 6252.910004] destroy-libvirt D ffff88004e0dc850 0 4523 4517 0x00000000
[ 6252.910004] ffff88005c1a3ad8 0000000000000082 ffff88005c1a3a38 ffffffff810871df
[ 6252.910004] 0000000000013cc0 ffff88004e0dc4a0 ffff88004e0dc850 ffff88005c1a3fd8
[ 6252.910004] ffff88004e0dc858 0000000000013cc0 ffff88005c1a2010 0000000000013cc0
[ 6252.910004] Call Trace:
[ 6252.910004] [<ffffffff810871df>] ? wake_up_bit+0x2f/0x40
[ 6252.910004] [<ffffffff8110b330>] ? sync_page_killable+0x0/0x40
[ 6252.910004] [<ffffffff815be4b0>] io_schedule+0x70/0xc0
[ 6252.910004] [<ffffffff8110b315>] sync_page+0x45/0x60
[ 6252.910004] [<ffffffff8110b33e>] sync_page_killable+0xe/0x40
[ 6252.910004] [<ffffffff815bec2a>] __wait_on_bit_lock+0x5a/0xc0
[ 6252.910004] [<ffffffff8110b237>] __lock_page_killable+0x67/0x70
[ 6252.910004] [<ffffffff81087230>] ? wake_bit_function+0x0/0x40
[ 6252.910004] [<ffffffff8110c07e>] ? find_get_page+0x1e/0xa0
[ 6252.910004] [<ffffffff8110ce2e>] T.900+0x2be/0x450
[ 6252.910004] [<ffffffff8110d0a8>] generic_file_aio_read+0xe8/0x230
[ 6252.910004] [<ffffffff81181d50>] ? mntput_no_expire+0x60/0x190
[ 6252.910004] [<ffffffffa03be5c9>] nfs_file_read+0xa9/0x110 [nfs]
[ 6252.910004] [<ffffffff8116363a>] do_sync_read+0xda/0x120
[ 6252.910004] [<ffffffff8127a78b>] ? security_file_permission+0x8b/0x90
[ 6252.910004] [<ffffffff81163d75>] vfs_read+0xc5/0x190
[ 6252.910004] [<ffffffff8116a446>] kernel_read+0x46/0x60
[ 6252.910004] [<ffffffff8116a538>] prepare_binprm+0xd8/0x100
[ 6252.910004] [<ffffffff8116b926>] do_execve+0x1b6/0x2d0
[ 6252.910004] [<ffffffff810147ea>] sys_execve+0x4a/0x80
[ 6252.910004] [<ffffffff8100c4dc>] stub_execve+0x6c/0xc0
Tcpdumping (needs to be done before the problem actually happens) shows this:
Frame 14327 (t+0.000000): OPEN; GETFH; GETATTR
Frame 14328 (t+0.000334): OPEN=NFS4_OK; GETFH=NFS4_OK (filehandle); GETATTR=NFS4_OK (attr)
Frame 14329 (t+0.000447): PUTFH; ACCESS; GETATTR
Frame 14330 (t+0.000564): PUTFH; DELEGRETURN; GETATTR
Frame 14331 (t+0.000729): PUTFH=NFS4_OK; ACCESS=NFS4_OK; GETATTR=NFS4_OK
Frame 14332 (t+0.000774): PUTFH; ACCESS; GETATTR
Frame 14333 (t+0.000830): PUTFH=NFS4_OK; DELEGRETURN=NFS4_OK; GETATTR=NFS4_OK
Frame 14335 (t+0.000978): PUTFH=NFS4_OK; ACCESS=NFS4_OK; GETATTR=NFS4_OK
Frame 14334 (t+0.000866): PUTFH; ACCESS; GETATTR
Frame 14337 (t+0.001080): PUTFH=NFS4_OK; ACCESS=NFS4_OK; GETATTR=NFS4_OK
Frame 14336 (t+0.001032): PUTFH; READ
Frame 14338 (t+0.001178): PUTFH=NFS4_OK; READ=NFS4ERR_EXPIRED
Frame 14342 (t+0.001557): PUTFH; READ
Frame 14343 (t+0.001776): PUTFH=NFS4_OK; READ=NFS4ERR_EXPIRED
Frame 14346 (t+0.002006): PUTFH; READ
Frame 14347 (t+0.002174): PUTFH=NFS4_OK; READ=NFS4ERR_EXPIRED
[...]
Frame 430012 (t+94.401360): PUTFH; READ
Frame 430013 (t+94.401524): PUTFH=NFS4_OK; READ=NFS4ERR_EXPIRED
[...]
The stateid from the failing READs match the stateid from the read
delegation done at OPEN time which also matches the stateid from the
DELEGRETURN.
So it seems there are two problems:
* the client should probably not do that DELEGRETURN
* the client should handle NFS4ERR_EXPIRED gracefully (not just retry)
The second problem is addressed by upstream commit
0ced63d1a245ac11241a5d37932e6d04d9c8040d and we expect it should fully
resolve the original problem (no more hanging in read()).
This is a request to backport
0ced63d1a245ac11241a5d37932e6d04d9c8040d in the lucid kernel.
To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/872398/+subscriptions