← Back to team overview

kernel-packages team mailing list archive

[Bug 872398] Re: NFSv4 client doesn't handle NFS4ERR_EXPIRED properly, read() never returns

 

Updated the kernel-fixed-upstream thing as per git describe --contains
0ced63d1a245ac11241a5d37932e6d04d9c8040d. I am unsubscribing from this
bug as I am not running Ubuntu any more and have little interest
chatting with bots.

** Tags added: kernel-fixed-upstream-v3.0-rc1

** Tags added: kernel-fixed-upstream

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/872398

Title:
  NFSv4 client doesn't handle NFS4ERR_EXPIRED properly, read() never
  returns

Status in “linux” package in Ubuntu:
  Incomplete
Status in “linux” package in CentOS:
  New

Bug description:
  This is Ubuntu 2.6.38-10.46~lucid1-generic 2.6.38.7 x86_64.

  When attempting to read a file over NFSv4, sometimes the application
  stops in the first read() after the open(). sysrq-t shows something
  like this:

  [ 6252.910004] cat             D ffff8800454383b0     0  2886   8323 0x00000000    
  [ 6252.910004]  ffff88004e343b98 0000000000000086 ffff88004e343ae8 ffffffffa034b4e8
  [ 6252.910004]  0000000000013cc0 ffff880045438000 ffff8800454383b0 ffff88004e343fd8
  [ 6252.910004]  ffff8800454383b8 0000000000013cc0 ffff88004e342010 0000000000013cc0
  [ 6252.910004] Call Trace:
  [ 6252.910004]  [<ffffffff8110b330>] ? sync_page_killable+0x0/0x40
  [ 6252.910004]  [<ffffffff815be4b0>] io_schedule+0x70/0xc0
  [ 6252.910004]  [<ffffffff8110b315>] sync_page+0x45/0x60
  [ 6252.910004]  [<ffffffff8110b33e>] sync_page_killable+0xe/0x40
  [ 6252.910004]  [<ffffffff815bec2a>] __wait_on_bit_lock+0x5a/0xc0 
  [ 6252.910004]  [<ffffffff8110b237>] __lock_page_killable+0x67/0x70
  [ 6252.910004]  [<ffffffff81087230>] ? wake_bit_function+0x0/0x40 
  [ 6252.910004]  [<ffffffff8110c07e>] ? find_get_page+0x1e/0xa0
  [ 6252.910004]  [<ffffffff8110ce2e>] T.900+0x2be/0x450
  [ 6252.910004]  [<ffffffff8110d0a8>] generic_file_aio_read+0xe8/0x230
  [ 6252.910004]  [<ffffffffa03be5c9>] nfs_file_read+0xa9/0x110 [nfs]
  [ 6252.910004]  [<ffffffff8116363a>] do_sync_read+0xda/0x120
  [ 6252.910004]  [<ffffffff81131568>] ? handle_mm_fault+0x148/0x270
  [ 6252.910004]  [<ffffffff8127a78b>] ? security_file_permission+0x8b/0x90
  [ 6252.910004]  [<ffffffff81163d75>] vfs_read+0xc5/0x190
  [ 6252.910004]  [<ffffffff81163f41>] sys_read+0x51/0x90
  [ 6252.910004]  [<ffffffff8100c082>] system_call_fastpath+0x16/0x1b

  When the system is in that situation any access to the file hangs in a
  similar way. Here is an attempt to execve() the file:

  [ 6252.910004] destroy-libvirt D ffff88004e0dc850     0  4523   4517 0x00000000    
  [ 6252.910004]  ffff88005c1a3ad8 0000000000000082 ffff88005c1a3a38 ffffffff810871df
  [ 6252.910004]  0000000000013cc0 ffff88004e0dc4a0 ffff88004e0dc850 ffff88005c1a3fd8
  [ 6252.910004]  ffff88004e0dc858 0000000000013cc0 ffff88005c1a2010 0000000000013cc0
  [ 6252.910004] Call Trace:
  [ 6252.910004]  [<ffffffff810871df>] ? wake_up_bit+0x2f/0x40
  [ 6252.910004]  [<ffffffff8110b330>] ? sync_page_killable+0x0/0x40
  [ 6252.910004]  [<ffffffff815be4b0>] io_schedule+0x70/0xc0
  [ 6252.910004]  [<ffffffff8110b315>] sync_page+0x45/0x60
  [ 6252.910004]  [<ffffffff8110b33e>] sync_page_killable+0xe/0x40
  [ 6252.910004]  [<ffffffff815bec2a>] __wait_on_bit_lock+0x5a/0xc0 
  [ 6252.910004]  [<ffffffff8110b237>] __lock_page_killable+0x67/0x70
  [ 6252.910004]  [<ffffffff81087230>] ? wake_bit_function+0x0/0x40 
  [ 6252.910004]  [<ffffffff8110c07e>] ? find_get_page+0x1e/0xa0
  [ 6252.910004]  [<ffffffff8110ce2e>] T.900+0x2be/0x450
  [ 6252.910004]  [<ffffffff8110d0a8>] generic_file_aio_read+0xe8/0x230
  [ 6252.910004]  [<ffffffff81181d50>] ? mntput_no_expire+0x60/0x190
  [ 6252.910004]  [<ffffffffa03be5c9>] nfs_file_read+0xa9/0x110 [nfs]
  [ 6252.910004]  [<ffffffff8116363a>] do_sync_read+0xda/0x120
  [ 6252.910004]  [<ffffffff8127a78b>] ? security_file_permission+0x8b/0x90
  [ 6252.910004]  [<ffffffff81163d75>] vfs_read+0xc5/0x190
  [ 6252.910004]  [<ffffffff8116a446>] kernel_read+0x46/0x60
  [ 6252.910004]  [<ffffffff8116a538>] prepare_binprm+0xd8/0x100
  [ 6252.910004]  [<ffffffff8116b926>] do_execve+0x1b6/0x2d0
  [ 6252.910004]  [<ffffffff810147ea>] sys_execve+0x4a/0x80
  [ 6252.910004]  [<ffffffff8100c4dc>] stub_execve+0x6c/0xc0

  Tcpdumping (needs to be done before the problem actually happens) shows this:
  Frame 14327 (t+0.000000): OPEN; GETFH; GETATTR
  Frame 14328 (t+0.000334): OPEN=NFS4_OK; GETFH=NFS4_OK (filehandle); GETATTR=NFS4_OK (attr)
  Frame 14329 (t+0.000447): PUTFH; ACCESS; GETATTR
  Frame 14330 (t+0.000564): PUTFH; DELEGRETURN; GETATTR
  Frame 14331 (t+0.000729): PUTFH=NFS4_OK; ACCESS=NFS4_OK; GETATTR=NFS4_OK
  Frame 14332 (t+0.000774): PUTFH; ACCESS; GETATTR
  Frame 14333 (t+0.000830): PUTFH=NFS4_OK; DELEGRETURN=NFS4_OK; GETATTR=NFS4_OK
  Frame 14335 (t+0.000978): PUTFH=NFS4_OK; ACCESS=NFS4_OK; GETATTR=NFS4_OK
  Frame 14334 (t+0.000866): PUTFH; ACCESS; GETATTR
  Frame 14337 (t+0.001080): PUTFH=NFS4_OK; ACCESS=NFS4_OK; GETATTR=NFS4_OK
  Frame 14336 (t+0.001032): PUTFH; READ
  Frame 14338 (t+0.001178): PUTFH=NFS4_OK; READ=NFS4ERR_EXPIRED
  Frame 14342 (t+0.001557): PUTFH; READ
  Frame 14343 (t+0.001776): PUTFH=NFS4_OK; READ=NFS4ERR_EXPIRED
  Frame 14346 (t+0.002006): PUTFH; READ
  Frame 14347 (t+0.002174): PUTFH=NFS4_OK; READ=NFS4ERR_EXPIRED
  [...]
  Frame 430012 (t+94.401360): PUTFH; READ
  Frame 430013 (t+94.401524): PUTFH=NFS4_OK; READ=NFS4ERR_EXPIRED
  [...]

  The stateid from the failing READs match the stateid from the read
  delegation done at OPEN time which also matches the stateid from the
  DELEGRETURN.

  So it seems there are two problems:
  * the client should probably not do that DELEGRETURN
  * the client should handle NFS4ERR_EXPIRED gracefully (not just retry) 

  The second problem is addressed by upstream commit
  0ced63d1a245ac11241a5d37932e6d04d9c8040d and we expect it should fully
  resolve the original problem (no more hanging in read()).

  This is  a request to backport
  0ced63d1a245ac11241a5d37932e6d04d9c8040d in the lucid kernel.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/872398/+subscriptions