← Back to team overview

kernel-packages team mailing list archive

Re: [Bug 1499203] Re: memory leak in hv_storvsc (3.13.0-63-generic)

 

On Friday, October 09, 2015 at 06:59, Oskar Liljeblad wrote:
> > > To see if it is the cause of this issue, I built a test kernel with a
> > > revert of commit 97b2591.  The test kernel can be downloaded from:
> > > 
> > > http://kernel.ubuntu.com/~jsalisbury/lp1499203/
[..]
> The 3.13.0-66.107~lp1445195Commit97b2591Reverted kernel seem to work just
> fine. No memory leaks as far as I can see.

By the way, I had to downgrade the kernel above to 3.13.0-65.106 on one
server because of some strange IO lockup issues. I'm afraid this won't be
of much help, but I'm writing it anyway.
It started 1 minute after boot with the new kernel:

Oct 13 00:06:16 af-mdbdrs2 kernel: [   66.106544] BUG: unable to handle kernel NULL pointer dereference at           (null)
Oct 13 00:06:16 af-mdbdrs2 kernel: [   66.106592] IP: [<ffffffff81206c5b>] eventpoll_release_file+0x2b/0xa0
Oct 13 00:06:16 af-mdbdrs2 kernel: [   66.106624] PGD 1f72db067 PUD 1fa753067 PMD 0
Oct 13 00:06:16 af-mdbdrs2 kernel: [   66.106659] Oops: 0000 [#1] SMP
Oct 13 00:06:16 af-mdbdrs2 kernel: [   66.106684] Modules linked in: joydev hid_generic mac_hid serio_raw crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd nls_iso8859_1 hid_hyperv hyperv_fb hid hyperv_keyboard lp parport hv_netvsc hv_utils hv_storvsc hv_vmbus
Oct 13 00:06:16 af-mdbdrs2 kernel: [   66.106848] CPU: 1 PID: 1286 Comm: mongod Not tainted 3.13.0-66-generic #107~lp1445195Commit97b2591Reverted
Oct 13 00:06:16 af-mdbdrs2 kernel: [   66.106884] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v1.0 11/26/2012
Oct 13 00:06:16 af-mdbdrs2 kernel: [   66.106923] task: ffff8801f722c800 ti: ffff8801f72ce000 task.ti: ffff8801f72ce000
Oct 13 00:06:16 af-mdbdrs2 kernel: [   66.106950] RIP: 0010:[<ffffffff81206c5b>]  [<ffffffff81206c5b>] eventpoll_release_file+0x2b/0xa0
Oct 13 00:06:16 af-mdbdrs2 kernel: [   66.106986] RSP: 0018:ffff8801f72cfe78  EFLAGS: 00010246
Oct 13 00:06:16 af-mdbdrs2 kernel: [   66.107006] RAX: 0000000000000000 RBX: ffff8801f775e300 RCX: 0000000040000010
Oct 13 00:06:16 af-mdbdrs2 kernel: [   66.107032] RDX: 0000000001000000 RSI: 0000000000000000 RDI: ffffffff81c72e80
Oct 13 00:06:16 af-mdbdrs2 kernel: [   66.107058] RBP: ffff8801f72cfea0 R08: 0000000000000000 R09: 0000000000000001
Oct 13 00:06:16 af-mdbdrs2 kernel: [   66.107084] R10: ffff8801f775ece1 R11: 0000000000000293 R12: 0000000000000010
Oct 13 00:06:16 af-mdbdrs2 kernel: [   66.107110] R13: ffff8801f775ece1 R14: ffff8801f775ee40 R15: ffff8801f775e3b0
Oct 13 00:06:16 af-mdbdrs2 kernel: [   66.107137] FS:  00007f23b299f700(0000) GS:ffff8801fee20000(0000) knlGS:0000000000000000
Oct 13 00:06:16 af-mdbdrs2 kernel: [   66.107166] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 13 00:06:16 af-mdbdrs2 kernel: [   66.107190] CR2: 0000000000000000 CR3: 00000001f7a94000 CR4: 00000000001406e0
Oct 13 00:06:16 af-mdbdrs2 kernel: [   66.107224] Stack:
Oct 13 00:06:16 af-mdbdrs2 kernel: [   66.107235]  ffff8801f775e300 0000000000000010 ffff8801f775ece1 ffff8801f775ee40
Oct 13 00:06:16 af-mdbdrs2 kernel: [   66.107270]  ffff880036927a40 ffff8801f72cfee8 ffffffff811bfb7a ffffffff8133ed81
Oct 13 00:06:16 af-mdbdrs2 kernel: [   66.107302]  ffff8801fa8bbe30 0000000000000000 ffffffff81ebb680 ffff8801f722ce20
Oct 13 00:06:16 af-mdbdrs2 kernel: [   66.107336] Call Trace:
Oct 13 00:06:16 af-mdbdrs2 kernel: [   66.107353]  [<ffffffff811bfb7a>] __fput+0x24a/0x260
Oct 13 00:06:16 af-mdbdrs2 kernel: [   66.107375]  [<ffffffff8133ed81>] ? blkdev_issue_flush+0x71/0x90
Oct 13 00:06:16 af-mdbdrs2 kernel: [   66.107400]  [<ffffffff811bfbde>] ____fput+0xe/0x10
Oct 13 00:06:16 af-mdbdrs2 kernel: [   66.107421]  [<ffffffff81088377>] task_work_run+0xa7/0xe0
Oct 13 00:06:16 af-mdbdrs2 kernel: [   66.107444]  [<ffffffff81013e57>] do_notify_resume+0x97/0xb0
Oct 13 00:06:16 af-mdbdrs2 kernel: [   66.107468]  [<ffffffff8173431a>] int_signal+0x12/0x17
Oct 13 00:06:16 af-mdbdrs2 kernel: [   66.107491] Code: 0f 1f 44 00 00 55 48 89 e5 41 57 49 89 ff 48 c7 c7 80 2e c7 81 49 81 c7 b0 00 00 00 41 56 41 55 41 54 53 e8 b8 30 52 00 49 8b 07 <48> 8b 08 49 39 c7 4c 8d 60 a8 48 8d 59 a8 75 0b eb 3e 0f 1f 00
Oct 13 00:06:16 af-mdbdrs2 kernel: [   66.107648] RIP  [<ffffffff81206c5b>] eventpoll_release_file+0x2b/0xa0
Oct 13 00:06:16 af-mdbdrs2 kernel: [   66.107675]  RSP <ffff8801f72cfe78>
Oct 13 00:06:16 af-mdbdrs2 kernel: [   66.107689] CR2: 0000000000000000
Oct 13 00:06:16 af-mdbdrs2 kernel: [   66.107717] ---[ end trace 87deccc21e1958fa ]---
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.210565] ------------[ cut here ]------------
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.210612] kernel BUG at /home/jsalisbury/bugs/lp1499203/ubuntu-trusty/mm/rmap.c:1035!
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.210642] invalid opcode: 0000 [#2] SMP
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.210663] Modules linked in: joydev hid_generic mac_hid serio_raw crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd nls_iso8859_1 hid_hyperv hyperv_fb hid hyperv_keyboard lp parport hv_netvsc hv_utils hv_storvsc hv_vmbus
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.210796] CPU: 1 PID: 1771 Comm: mongod Tainted: G      D       3.13.0-66-generic #107~lp1445195Commit97b2591Reverted
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.210834] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v1.0 11/26/2012
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.210873] task: ffff8801f7713000 ti: ffff8801fafa4000 task.ti: ffff8801fafa4000
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.210900] RIP: 0010:[<ffffffff8171ee8a>]  [<ffffffff8171ee8a>] __page_set_anon_rmap.part.22+0x9/0xb
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.210939] RSP: 0018:ffff8801fafa59e8  EFLAGS: 00010246
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.210960] RAX: 0000000000000000 RBX: ffffea00079a2340 RCX: ffffffffffffffe8
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.210986] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff880207ff4f00
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.211021] RBP: ffff8801fafa59e8 R08: 00000000fffffff9 R09: 0000000000000000
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.212294] R10: 000000000000000c R11: 00000000003e9480 R12: 00007f084a5619e0
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.214126] R13: 0000000000000000 R14: ffff8801f775e300 R15: 0000000000000000
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.214539] FS:  00007f084a561700(0000) GS:ffff8801fee20000(0000) knlGS:0000000000000000
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.214539] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.214539] CR2: 00007f084a5619e0 CR3: 00000001f7a94000 CR4: 00000000001406e0
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.214539] Stack:
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.214539]  ffff8801fafa5a18 ffffffff8118464a 00007f084a5619e0 ffff8800f78ea290
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.214539]  ffff8801f775e300 ffff8801fa652300 ffff8801fafa5ab0 ffffffff8117a708
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.214539]  ffff880035aab300 0000000035aab300 0000000000000000 0000000000001f4a
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.214539] Call Trace:
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.214539]  [<ffffffff8118464a>] do_page_add_anon_rmap+0x10a/0x120
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.214539]  [<ffffffff8117a708>] handle_mm_fault+0xcf8/0xf00
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.214539]  [<ffffffff8172f624>] __do_page_fault+0x184/0x560
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.214539]  [<ffffffff810a3281>] ? update_cfs_shares+0xb1/0x100
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.214539]  [<ffffffff8109ee48>] ? __enqueue_entity+0x78/0x80
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.214539]  [<ffffffff810a51dd>] ? enqueue_entity+0x2ad/0xbb0
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.214539]  [<ffffffff8101bb33>] ? native_sched_clock+0x13/0x80
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.214539]  [<ffffffff810a5f02>] ? enqueue_task_fair+0x422/0x6d0
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.214539]  [<ffffffff8172fa1a>] do_page_fault+0x1a/0x70
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.214539]  [<ffffffff8172bd68>] page_fault+0x28/0x30
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.214539]  [<ffffffff8137184f>] ? __get_user_8+0x1f/0x29
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.214539]  [<ffffffff810db202>] ? exit_robust_list+0x32/0x130
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.214539]  [<ffffffff81064a53>] mm_release+0x123/0x140
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.214539]  [<ffffffff81069b43>] do_exit+0x153/0xa40
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.214539]  [<ffffffff8106a4af>] do_group_exit+0x3f/0xa0
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.214539]  [<ffffffff8107a190>] get_signal_to_deliver+0x1d0/0x6d0
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.214539]  [<ffffffff810133f8>] do_signal+0x48/0xa10
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.214539]  [<ffffffff81179e92>] ? handle_mm_fault+0x482/0xf00
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.214539]  [<ffffffff81013e29>] do_notify_resume+0x69/0xb0
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.214539]  [<ffffffff8172bb62>] retint_signal+0x48/0x86
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.214539] Code: c4 40 74 03 8b 4f 68 bf 00 10 00 00 48 d3 e7 e8 2d 58 a7 ff 5d c3 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 0f 1f 44 00 00 55 48 89 e5 <0f> 0b 0f 1f 44 00 00 55 48 89 e5 0f 0b 55 89 f2 be 00 80 00 00
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.214539] RIP  [<ffffffff8171ee8a>] __page_set_anon_rmap.part.22+0x9/0xb
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.214539]  RSP <ffff8801fafa59e8>
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.249727] ---[ end trace 87deccc21e1958fb ]---
Oct 13 00:08:51 af-mdbdrs2 kernel: [  221.251013] Fixing recursive fault but reboot is needed!

After that all IO on that device stuck.
I rebooted the server and the issue occurred again, basically the same messages logged.

Regards,

Oskar Liljeblad

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1499203

Title:
  memory leak in hv_storvsc (3.13.0-63-generic)

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Trusty:
  Confirmed

Bug description:
  Slab and SUnreclaim values in /proc/meminfo keep increasing. On one
  servers it reached 85% of physical memory after 14 days - but on most
  other servers it increases more slowly. I checked /proc/slabinfo and
  almost all allocations were in kmalloc-512. So I enabled
  "slub_debug=U,kmalloc-512" on one server, and after only 24h of uptime
  11% of the memory was used by kmalloc-512 and unreclaimable. With
  debugging enabled I could see the following in
  /sys/kernel/slab/kmalloc-512/alloc_calls:

  521294 storvsc_queuecommand+0x359/0x790 [hv_storvsc]
  age=161922/955116/20882927 pid=1-41545

  All other counters were below 2000. In
  /sys/kernel/slab/kmalloc-512/free_calls I see the following:

  516823 <not-available> age=4315783846 pid=0

  The hv_storvsc module is for Hyper-V. We are (unfortunately) running
  Hyper-V 6.3.9600.16384 with Microsoft System Center 2012 R2 Update
  rollup 3 for all the servers with this issue.

  Kernels are stock linux-image-3.13.0-63-generic, 3.13.0-63.103,
  x86_64, from Ubuntu 14.04 LTS . /proc/version_signature contains:

    Ubuntu 3.13.0-63.103-generic 3.13.11-ckt25

  No output from lspci -vnvn. The problem described above happens on
  both single and multicore virtual machines. CPU in hypervisors are
  E5-2630 v2 @ 2.60GHz. Let me know if you need more info or if I can do
  more debugging.

  Regards,

  Oskar Liljeblad
  --- 
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 Sep 24 00:31 seq
   crw-rw---- 1 root audio 116, 33 Sep 24 00:31 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.14.1-0ubuntu3.13
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
  CRDA: Error: [Errno 2] No such file or directory
  CurrentDmesg:
   [59081.977909] systemd-udevd[26480]: starting version 204
   [59124.051974] init: systemd-logind main process (756) killed by TERM signal
  DistroRelease: Ubuntu 14.04
  InstallationDate: Installed on 2014-09-09 (380 days ago)
  InstallationMedia: Ubuntu-Server 14.04.1 LTS "Trusty Tahr" - Release amd64 (20140722.3)
  IwConfig:
   eth0      no wireless extensions.
   
   eth1      no wireless extensions.
   
   lo        no wireless extensions.
  Lspci:
   
  Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize libusb: -99
  MachineType: Microsoft Corporation Virtual Machine
  Package: linux (not installed)
  PciMultimedia:
   
  ProcFB: 0 hyperv_fb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.13.0-63-generic.efi.signed root=UUID=f4d228d6-2eee-40fc-bf3f-633e46fa8301 ro slub_debug=U,kmalloc-512
  ProcVersionSignature: Ubuntu 3.13.0-63.103-generic 3.13.11-ckt25
  RelatedPackageVersions:
   linux-restricted-modules-3.13.0-63-generic N/A
   linux-backports-modules-3.13.0-63-generic  N/A
   linux-firmware                             1.127.15
  RfKill: Error: [Errno 2] No such file or directory
  Tags:  trusty
  Uname: Linux 3.13.0-63-generic x86_64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  WifiSyslog:
   Sep 24 02:06:19 adm-backup1 dhclient: message repeated 1447 times: [ DHCPREQUEST of 10.40.128.9 on eth0 to 192.0.2.253 port 67 (xid=0x429dad4)]
   Sep 24 02:06:37 adm-backup1 dhclient: DHCPREQUEST of 10.40.128.9 on eth0 to 255.255.255.255 port 67 (xid=0x429dad4)
   Sep 24 02:06:37 adm-backup1 dhclient: DHCPACK of 10.40.128.9 from 192.0.2.253
   Sep 24 02:06:37 adm-backup1 dhclient: bound to 10.40.128.9 -- renewal in 44877 seconds.
  _MarkForUpload: True
  dmi.bios.date: 11/26/2012
  dmi.bios.vendor: Microsoft Corporation
  dmi.bios.version: Hyper-V UEFI Release v1.0
  dmi.board.asset.tag: None
  dmi.board.name: Virtual Machine
  dmi.board.vendor: Microsoft Corporation
  dmi.board.version: Hyper-V UEFI Release v1.0
  dmi.chassis.asset.tag: 6126-4244-1659-0314-3158-3955-44
  dmi.chassis.type: 3
  dmi.chassis.vendor: Microsoft Corporation
  dmi.chassis.version: Hyper-V UEFI Release v1.0
  dmi.modalias: dmi:bvnMicrosoftCorporation:bvrHyper-VUEFIReleasev1.0:bd11/26/2012:svnMicrosoftCorporation:pnVirtualMachine:pvrHyper-VUEFIReleasev1.0:rvnMicrosoftCorporation:rnVirtualMachine:rvrHyper-VUEFIReleasev1.0:cvnMicrosoftCorporation:ct3:cvrHyper-VUEFIReleasev1.0:
  dmi.product.name: Virtual Machine
  dmi.product.version: Hyper-V UEFI Release v1.0
  dmi.sys.vendor: Microsoft Corporation

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1499203/+subscriptions


References