← Back to team overview

kernel-packages team mailing list archive

[Bug 585657] Re: Transfering large files to nfs mount causes system freeze

 

Still happening in 14.04 LTS

Client (1Gbps Link):
[ 3038.818986] nfs: server 10.0.0.200 not responding, timed out
[ 3038.818991] nfs: server 10.0.0.200 not responding, timed out
[ 3038.818996] nfs: server 10.0.0.200 not responding, timed out
[ 3038.819001] nfs: server 10.0.0.200 not responding, timed out
[ 3038.819006] nfs: server 10.0.0.200 not responding, timed out
[ 3038.819012] nfs: server 10.0.0.200 not responding, timed out
[ 3038.819017] nfs: server 10.0.0.200 not responding, timed out
[ 3038.958559] nfs: server 10.0.0.200 not responding, timed out

Pings are under 1ms

Crash:
Nov  2 06:14:49 rtd-lin-nnrpd04 kernel: [888544.799988] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
Nov  2 06:14:49 rtd-lin-nnrpd04 kernel: [888544.815847] BUG: unable to handle kernel paging request at ffffea00084c2540
Nov  2 06:14:49 rtd-lin-nnrpd04 kernel: [888544.824363] IP: [<ffffea00084c2540>] 0xffffea00084c2540
Nov  2 06:14:49 rtd-lin-nnrpd04 kernel: [888544.832785] PGD 82fff5067 PUD 82fff4067 PMD 80000008176001e3 
Nov  2 06:14:49 rtd-lin-nnrpd04 kernel: [888544.841143] Oops: 0011 [#2] SMP 
Nov  2 06:14:49 rtd-lin-nnrpd04 kernel: [888544.849239] Modules linked in: mptctl xt_comment iptable_filter xt_multiport ip_tables x_tables rpcsec_gss_krb5 nfsv4 nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache nf_conntrack_netlink nf_conntrack nfnetlink intel_powerclamp coretemp kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw ipmi_devintf serio_raw joydev gf128mul glue_helper dcdbas ablk_helper i7core_edac acpi_power_meter gpio_ich lpc_ich ipmi_si edac_core cryptd ipmi_msghandler shpchp mac_hid lp parport tcp_htcp hid_generic mptsas mptscsih usbhid mptbase psmouse hid scsi_transport_sas pata_acpi bnx2
Nov  2 06:14:49 rtd-lin-nnrpd04 kernel: [888544.919101] CPU: 6 PID: 210 Comm: kswapd0 Tainted: G      D W     3.16.0-51-generic #69~14.04.1-Ubuntu
Nov  2 06:14:49 rtd-lin-nnrpd04 kernel: [888544.937875] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
Nov  2 06:14:49 rtd-lin-nnrpd04 kernel: [888544.957034] task: ffff8808043c1e90 ti: ffff880802204000 task.ti: ffff880802204000
Nov  2 06:14:49 rtd-lin-nnrpd04 kernel: [888544.976975] RIP: 0010:[<ffffea00084c2540>]  [<ffffea00084c2540>] 0xffffea00084c2540
Nov  2 06:14:49 rtd-lin-nnrpd04 kernel: [888544.997565] RSP: 0018:ffff880802207a40  EFLAGS: 00010282
Nov  2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.007973] RAX: ffff8807f94c8848 RBX: ffff880802207db0 RCX: 0000000000000000
Nov  2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.028679] RDX: ffffea00084c2540 RSI: 0000000000000002 RDI: ffffea001623df80
Nov  2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.049907] RBP: ffff880802207b40 R08: ffff880002d078e8 R09: ffff880005eaf478
Nov  2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.071886] R10: ffff8808022079c8 R11: ffffea003f7e0980 R12: ffffea002f1b4b60
Nov  2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.094261] R13: ffff880802207bc8 R14: ffffea002f1b4b40 R15: 0000000000000001
Nov  2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.116663] FS:  0000000000000000(0000) GS:ffff88102fc60000(0000) knlGS:0000000000000000
Nov  2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.139733] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Nov  2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.151330] CR2: ffffea00084c2540 CR3: 0000000001c13000 CR4: 00000000000007e0
Nov  2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.173960] Stack:
Nov  2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.184901]  ffffffff81174751 ffff8808043c1e90 ffff8808043c1e90 ffff8808043c1e90
Nov  2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.206832]  000000010348c640 ffff880802207bb0 ffff880802207ba0 ffff88102fff9f00
Nov  2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.228830]  0000000000000000 000000000000001d ffff8808043c1e90 0000000000000000
Nov  2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.250885] Call Trace:
Nov  2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.261761]  [<ffffffff81174751>] ? shrink_page_list+0x241/0xaa0
Nov  2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.272587]  [<ffffffff81175645>] shrink_inactive_list+0x1c5/0x560
Nov  2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.283274]  [<ffffffff81176343>] shrink_lruvec+0x523/0x710
Nov  2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.293916]  [<ffffffff811765ac>] shrink_zone+0x7c/0x1b0
Nov  2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.304157]  [<ffffffff811776e5>] balance_pgdat+0x3b5/0x620
Nov  2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.314106]  [<ffffffff81177aab>] kswapd+0x15b/0x3f0
Nov  2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.323960]  [<ffffffff810b50e0>] ? prepare_to_wait_event+0x100/0x100
Nov  2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.333809]  [<ffffffff81177950>] ? balance_pgdat+0x620/0x620
Nov  2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.343503]  [<ffffffff810915a2>] kthread+0xd2/0xf0
Nov  2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.352951]  [<ffffffff810914d0>] ? kthread_create_on_node+0x1c0/0x1c0
Nov  2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.362610]  [<ffffffff8176f2d8>] ret_from_fork+0x58/0x90
Nov  2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.372355]  [<ffffffff810914d0>] ? kthread_create_on_node+0x1c0/0x1c0
Nov  2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.381537] Code: 00 00 00 ff ff ff ff 01 00 00 00 60 d1 ef 0d 00 ea ff ff e0 20 b7 0b 00 ea ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <68> 00 00 00 00 ff ff 06 28 c8 96 71 06 88 ff ff 2e 00 00 00 00 
Nov  2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.408965] RIP  [<ffffea00084c2540>] 0xffffea00084c2540
Nov  2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.417611]  RSP <ffff880802207a40>
Nov  2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.425778] CR2: ffffea00084c2540
Nov  2 06:14:49 rtd-lin-nnrpd04 kernel: [888545.433893] ---[ end trace 0021a14ede94c8d6 ]---


Server (10Gbps NIC)
Full of these errors:
[114310.563718] RPC request reserved 156 but used 212
[114310.565495] RPC request reserved 156 but used 176
[114310.569816] RPC request reserved 156 but used 176
[114310.576001] RPC request reserved 156 but used 176
[114310.580087] RPC request reserved 156 but used 176
[115206.967835] RPC request reserved 156 but used 212
[115206.981548] RPC request reserved 156 but used 176
[115811.134896] RPC request reserved 156 but used 176
[115811.136346] RPC request reserved 156 but used 176

All machines are Dell R410s and Dell R430s with Broadcom NICs.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/585657

Title:
  Transfering large files to nfs mount causes system freeze

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Lucid:
  Fix Released
Status in linux source package in Maverick:
  Fix Released
Status in linux source package in Natty:
  Fix Released
Status in linux source package in Hardy:
  Fix Released

Bug description:
  Binary package hint: nfs-kernel-server

  I have verified this bug on both karmic and lucid on both the server
  and client:

  -------------------------------------------------------------------------------

  Description:	Ubuntu 9.10
  Release:	9.10

  nfs-common:
    Installed: 1:1.2.0-2ubuntu8

  nfs-kernel-server:
    Installed: 1:1.2.0-2ubuntu8

  portmap:
    Installed: 6.0-10ubuntu2

  -------------------------------------------------------------------------------

  Description:	Ubuntu 10.04 LTS
  Release:	10.04

  nfs-common:
    Installed: 1:1.2.0-4ubuntu4

  nfs-kernel-server:
    Installed: 1:1.2.0-4ubuntu4

  portmap:
    Installed: 6.0.0-1ubuntu2

  -------------------------------------------------------------------------------

  Expected behavior:

  Copying large files from local directories to an nfs mounted directory
  should complete without error.

  -------------------------------------------------------------------------------

  Actual behavior:

  The system freezes while trying to copy large files from a local
  directory (e.g. /tmp) to an nfs mounted directory. This causes various
  things to fail to respond, ultimately resulting in a hard reboot and
  potential loss of data. When this occurs I am able to log into the box
  via ssh, but even sudo is unable to kill -9 the wayward file copy or
  reboot the machine gracefully.

  -------------------------------------------------------------------------------

  Details:

  The server exports several directories, for example:

  /home/shared
  /home/user1/Documents
  /home/user1/Development

  The client mounts these as follows:

  server1:/home/shared    /home/shared    nfs rw,soft,intr 0 0
  server1:/home/user1/Development /home/server1/user1/Development nfs rw,soft,intr 0 0
  server1:/home/user1/Documents   /home/server1/user1/Documents   nfs rw,soft,intr 0 0

  I see lots of messages like this in /var/log/syslog:

  May 22 10:44:31 client1 kernel: [ 1680.390484] INFO: task cp:2791 blocked for more than 120 seconds.
  May 22 10:44:31 client1 kernel: [ 1680.390488] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  May 22 10:44:31 client1 kernel: [ 1680.390492] cp D 00000000ffffffff 0 2791 2503 0x00000000
  May 22 10:44:31 client1 kernel: [ 1680.390501] ffff88012a457c48 0000000000000082 0000000000015bc0 0000000000015bc0
  May 22 10:44:31 client1 kernel: [ 1680.390508] ffff8801291331a0 ffff88012a457fd8 0000000000015bc0 ffff880129132de0
  May 22 10:44:31 client1 kernel: [ 1680.390516] 0000000000015bc0 ffff88012a457fd8 0000000000015bc0 ffff8801291331a0
  May 22 10:44:31 client1 kernel: [ 1680.390523] Call Trace:
  May 22 10:44:31 client1 kernel: [ 1680.390545] [<ffffffffa0cff2b0>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
  May 22 10:44:31 client1 kernel: [ 1680.390552] [<ffffffff8153eb87>] io_schedule+0x47/0x70
  May 22 10:44:31 client1 kernel: [ 1680.390573] [<ffffffffa0cff2be>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs]
  May 22 10:44:31 client1 kernel: [ 1680.390579] [<ffffffff8153f3df>] __wait_on_bit+0x5f/0x90
  May 22 10:44:31 client1 kernel: [ 1680.390587] [<ffffffff812b6234>] ? __lookup_tag+0x64/0x120
  May 22 10:44:31 client1 kernel: [ 1680.390608] [<ffffffffa0cff2b0>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
  May 22 10:44:31 client1 kernel: [ 1680.390615] [<ffffffff8153f488>] out_of_line_wait_on_bit+0x78/0x90
  May 22 10:44:31 client1 kernel: [ 1680.390622] [<ffffffff81085360>] ? wake_bit_function+0x0/0x40
  May 22 10:44:31 client1 kernel: [ 1680.390643] [<ffffffffa0cff29f>] nfs_wait_on_request+0x2f/0x40 [nfs]
  May 22 10:44:31 client1 kernel: [ 1680.390665] [<ffffffffa0d036af>] nfs_wait_on_requests_locked+0x7f/0xd0 [nfs]
  May 22 10:44:31 client1 kernel: [ 1680.390688] [<ffffffffa0d04aee>] nfs_sync_mapping_wait+0x9e/0x1a0 [nfs]
  May 22 10:44:31 client1 kernel: [ 1680.390711] [<ffffffffa0d04ed9>] nfs_write_mapping+0x79/0xb0 [nfs]
  May 22 10:44:31 client1 kernel: [ 1680.390733] [<ffffffffa0d04f47>] nfs_wb_all+0x17/0x20 [nfs]
  May 22 10:44:31 client1 kernel: [ 1680.390751] [<ffffffffa0cf3eba>] nfs_do_fsync+0x2a/0x60 [nfs]
  May 22 10:44:31 client1 kernel: [ 1680.390770] [<ffffffffa0cf4105>] nfs_file_flush+0x75/0xa0 [nfs]
  May 22 10:44:31 client1 kernel: [ 1680.390777] [<ffffffff8114051c>] filp_close+0x3c/0x90
  May 22 10:44:31 client1 kernel: [ 1680.390783] [<ffffffff81140627>] sys_close+0xb7/0x120
  May 22 10:44:31 client1 kernel: [ 1680.390790] [<ffffffff810131b2>] system_call_fastpath+0x16/0x1b

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/585657/+subscriptions