← Back to team overview

kernel-packages team mailing list archive

[Bug 872398] Re: NFSv4 client doesn't handle NFS4ERR_EXPIRED properly, read() never returns

 

Adrien Kunysz, this bug was reported a while ago and there hasn't been
any activity in it recently. We were wondering if this is still an
issue? If so, could you please test for this with the latest development
release of Ubuntu? ISO images are available from
http://cdimage.ubuntu.com/daily-live/current/ .

If it remains an issue, could you please run the following command in
the development release from a Terminal
(Applications->Accessories->Terminal), as it will automatically gather
and attach updated debug information to this report:

apport-collect -p linux <replace-with-bug-number>

Also, could you please test the latest upstream kernel available following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue. Please do not test the daily folder, but the one all the way at the bottom. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested. For example:
kernel-fixed-upstream-v3.11-rc5

This can be done by clicking on the yellow circle with a black pencil icon next to the word Tags located at the bottom of the bug description. As well, please remove the tag:
needs-upstream-testing

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

As well, please remove the tag:
needs-upstream-testing

Once testing of the upstream kernel is complete, please mark this bug's
Status as Confirmed. Please let us know your results. Thank you for your
understanding.

** Tags added: lucid needs-kernel-logs needs-upstream-testing

** Changed in: linux (Ubuntu)
       Status: Confirmed => Incomplete

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/872398

Title:
  NFSv4 client doesn't handle NFS4ERR_EXPIRED properly, read() never
  returns

Status in “linux” package in Ubuntu:
  Incomplete
Status in “linux” package in CentOS:
  New

Bug description:
  This is Ubuntu 2.6.38-10.46~lucid1-generic 2.6.38.7 x86_64.

  When attempting to read a file over NFSv4, sometimes the application
  stops in the first read() after the open(). sysrq-t shows something
  like this:

  [ 6252.910004] cat             D ffff8800454383b0     0  2886   8323 0x00000000    
  [ 6252.910004]  ffff88004e343b98 0000000000000086 ffff88004e343ae8 ffffffffa034b4e8
  [ 6252.910004]  0000000000013cc0 ffff880045438000 ffff8800454383b0 ffff88004e343fd8
  [ 6252.910004]  ffff8800454383b8 0000000000013cc0 ffff88004e342010 0000000000013cc0
  [ 6252.910004] Call Trace:
  [ 6252.910004]  [<ffffffff8110b330>] ? sync_page_killable+0x0/0x40
  [ 6252.910004]  [<ffffffff815be4b0>] io_schedule+0x70/0xc0
  [ 6252.910004]  [<ffffffff8110b315>] sync_page+0x45/0x60
  [ 6252.910004]  [<ffffffff8110b33e>] sync_page_killable+0xe/0x40
  [ 6252.910004]  [<ffffffff815bec2a>] __wait_on_bit_lock+0x5a/0xc0 
  [ 6252.910004]  [<ffffffff8110b237>] __lock_page_killable+0x67/0x70
  [ 6252.910004]  [<ffffffff81087230>] ? wake_bit_function+0x0/0x40 
  [ 6252.910004]  [<ffffffff8110c07e>] ? find_get_page+0x1e/0xa0
  [ 6252.910004]  [<ffffffff8110ce2e>] T.900+0x2be/0x450
  [ 6252.910004]  [<ffffffff8110d0a8>] generic_file_aio_read+0xe8/0x230
  [ 6252.910004]  [<ffffffffa03be5c9>] nfs_file_read+0xa9/0x110 [nfs]
  [ 6252.910004]  [<ffffffff8116363a>] do_sync_read+0xda/0x120
  [ 6252.910004]  [<ffffffff81131568>] ? handle_mm_fault+0x148/0x270
  [ 6252.910004]  [<ffffffff8127a78b>] ? security_file_permission+0x8b/0x90
  [ 6252.910004]  [<ffffffff81163d75>] vfs_read+0xc5/0x190
  [ 6252.910004]  [<ffffffff81163f41>] sys_read+0x51/0x90
  [ 6252.910004]  [<ffffffff8100c082>] system_call_fastpath+0x16/0x1b

  When the system is in that situation any access to the file hangs in a
  similar way. Here is an attempt to execve() the file:

  [ 6252.910004] destroy-libvirt D ffff88004e0dc850     0  4523   4517 0x00000000    
  [ 6252.910004]  ffff88005c1a3ad8 0000000000000082 ffff88005c1a3a38 ffffffff810871df
  [ 6252.910004]  0000000000013cc0 ffff88004e0dc4a0 ffff88004e0dc850 ffff88005c1a3fd8
  [ 6252.910004]  ffff88004e0dc858 0000000000013cc0 ffff88005c1a2010 0000000000013cc0
  [ 6252.910004] Call Trace:
  [ 6252.910004]  [<ffffffff810871df>] ? wake_up_bit+0x2f/0x40
  [ 6252.910004]  [<ffffffff8110b330>] ? sync_page_killable+0x0/0x40
  [ 6252.910004]  [<ffffffff815be4b0>] io_schedule+0x70/0xc0
  [ 6252.910004]  [<ffffffff8110b315>] sync_page+0x45/0x60
  [ 6252.910004]  [<ffffffff8110b33e>] sync_page_killable+0xe/0x40
  [ 6252.910004]  [<ffffffff815bec2a>] __wait_on_bit_lock+0x5a/0xc0 
  [ 6252.910004]  [<ffffffff8110b237>] __lock_page_killable+0x67/0x70
  [ 6252.910004]  [<ffffffff81087230>] ? wake_bit_function+0x0/0x40 
  [ 6252.910004]  [<ffffffff8110c07e>] ? find_get_page+0x1e/0xa0
  [ 6252.910004]  [<ffffffff8110ce2e>] T.900+0x2be/0x450
  [ 6252.910004]  [<ffffffff8110d0a8>] generic_file_aio_read+0xe8/0x230
  [ 6252.910004]  [<ffffffff81181d50>] ? mntput_no_expire+0x60/0x190
  [ 6252.910004]  [<ffffffffa03be5c9>] nfs_file_read+0xa9/0x110 [nfs]
  [ 6252.910004]  [<ffffffff8116363a>] do_sync_read+0xda/0x120
  [ 6252.910004]  [<ffffffff8127a78b>] ? security_file_permission+0x8b/0x90
  [ 6252.910004]  [<ffffffff81163d75>] vfs_read+0xc5/0x190
  [ 6252.910004]  [<ffffffff8116a446>] kernel_read+0x46/0x60
  [ 6252.910004]  [<ffffffff8116a538>] prepare_binprm+0xd8/0x100
  [ 6252.910004]  [<ffffffff8116b926>] do_execve+0x1b6/0x2d0
  [ 6252.910004]  [<ffffffff810147ea>] sys_execve+0x4a/0x80
  [ 6252.910004]  [<ffffffff8100c4dc>] stub_execve+0x6c/0xc0

  Tcpdumping (needs to be done before the problem actually happens) shows this:
  Frame 14327 (t+0.000000): OPEN; GETFH; GETATTR
  Frame 14328 (t+0.000334): OPEN=NFS4_OK; GETFH=NFS4_OK (filehandle); GETATTR=NFS4_OK (attr)
  Frame 14329 (t+0.000447): PUTFH; ACCESS; GETATTR
  Frame 14330 (t+0.000564): PUTFH; DELEGRETURN; GETATTR
  Frame 14331 (t+0.000729): PUTFH=NFS4_OK; ACCESS=NFS4_OK; GETATTR=NFS4_OK
  Frame 14332 (t+0.000774): PUTFH; ACCESS; GETATTR
  Frame 14333 (t+0.000830): PUTFH=NFS4_OK; DELEGRETURN=NFS4_OK; GETATTR=NFS4_OK
  Frame 14335 (t+0.000978): PUTFH=NFS4_OK; ACCESS=NFS4_OK; GETATTR=NFS4_OK
  Frame 14334 (t+0.000866): PUTFH; ACCESS; GETATTR
  Frame 14337 (t+0.001080): PUTFH=NFS4_OK; ACCESS=NFS4_OK; GETATTR=NFS4_OK
  Frame 14336 (t+0.001032): PUTFH; READ
  Frame 14338 (t+0.001178): PUTFH=NFS4_OK; READ=NFS4ERR_EXPIRED
  Frame 14342 (t+0.001557): PUTFH; READ
  Frame 14343 (t+0.001776): PUTFH=NFS4_OK; READ=NFS4ERR_EXPIRED
  Frame 14346 (t+0.002006): PUTFH; READ
  Frame 14347 (t+0.002174): PUTFH=NFS4_OK; READ=NFS4ERR_EXPIRED
  [...]
  Frame 430012 (t+94.401360): PUTFH; READ
  Frame 430013 (t+94.401524): PUTFH=NFS4_OK; READ=NFS4ERR_EXPIRED
  [...]

  The stateid from the failing READs match the stateid from the read
  delegation done at OPEN time which also matches the stateid from the
  DELEGRETURN.

  So it seems there are two problems:
  * the client should probably not do that DELEGRETURN
  * the client should handle NFS4ERR_EXPIRED gracefully (not just retry) 

  The second problem is addressed by upstream commit
  0ced63d1a245ac11241a5d37932e6d04d9c8040d and we expect it should fully
  resolve the original problem (no more hanging in read()).

  This is  a request to backport
  0ced63d1a245ac11241a5d37932e6d04d9c8040d in the lucid kernel.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/872398/+subscriptions