← Back to team overview

kernel-packages team mailing list archive

[Bug 1013807] Re: transparent hugepages and thrashing on amd64

 

apport information

** Tags added: apport-collected wily

** Description changed:

  I seem to have found a solution to a severe thrashing/swapping/freezing
  problem that I've been having for months now.  I guess the real question
  is - should I turn it into a bug report and what would be useful data to
  include if so.
  
  This is a quad core AMD Phaeom system with 4G of ram, dual monitors and
  a single 1TB WD caviar black HD.  It had been behaving normally until
  something broke sometime late in the 11.x release cycle and continues in
  the current 12.04 LTS.  The symptoms are running a moderate load of apps
  (firefox with ~8 tabs, a terminal or 2, and aisleriot solitaire for
  example) and experiencing system freezes where the entire UI becomes
  totally unresponsive for 20 seconds - 5 minutes with solid disk
  activity.  Trying to figure out what was going on via iotop and top show
  jbd2 and kswapd accounting for the largest load, but since it freezes
  iotop like everything else I can't tell what's going during the worst
  storms.  Googling around shows a fair number of other people with
  similar problems, most of them with multi core amd64 systems.
  
  The other day I spotted this report on opensuse that looked similar but
  not identical:
  
  http://lists.opensuse.org/opensuse/2012-03/msg00657.html
  
  I booted with the grub parameter transparent_hugepage=never yesterday
  and the problem went and away and hasn't come back. I've streesed the
  system by running  a bunch of flash/java tabs in firefox, running a
  large java based stock app (ThinkorSwim) in another workspace and
  playing a 1080p 60fps movie in a third workspace.  This certainly causes
  swapping, but not freezing or stumbling.  It actually did a bit of
  swapping a minute ago while I was typing and it managed to make Pandora
  radio stumble for a moment - but that's orders of magnitude better than
  it has been.
  
  I think there may be a fundamental problem with how transparent
  hugepages are handled with some AMD CPUs.  I think this problem started
  when this feature was implemented and enabled by default. The manpage
  for madvise() says this was added in 2.6.38, but I don't know if it was
  enabled by default at that point.
  
  Hre's a partial list of things that haven't worked well in the past:
  
  Playing with the swappiness value: setting swappiness to very low values
  makes the problem take longer to surface, but (unsurprisinglly) makes it
  even worse once it does.
  
  swapoff-a ; swapon-a: this makes it go away for a while.  A potentially
  interesting thing is that as soon as I can get the system to act on the
  swapoff -a the system becomes responsive again. It pegs once CPU core at
  100% and the HD grinds like crazy but it stops freezing right away.
  
  Moving swap from the HD to a USB thumb drive: Obviously I didn't expect
  that to be faster but wanted to see if segregating swap to a different
  device on a different bus would make it swap more smoothly - it didn't.
  
  Playing with nice and ionice priorities for jdb2, kswapd.  The fact that
  running these processes at a lower priority than anything else on the
  system makes no difference leads me to think they were just symptoms and
  not at the root of the problem.
  
  I think this may be a tip of the iceberg and there may be a lot of other
  having this problem.  Looking around I see a fair number of reports,
  most of them unsolved.  Some may have been fixed by just adding enough
  RAM that dirty hugepages just don't collect.  Some may have been fixed
  by chaanging filesystems - ext4 seems like something a lot of people
  with this problem have in common.
  
  Workaround:
  hold down the spacebar during boot in order to bring up the grub menu, edit the command line and add
  transparent_hugepage=never
  
  If this fixes the problem you can make it permanent by editing
  /etc/default/grub and adding the ransparent_hugepage=never to the
  GRUB_CMDLINE_LINUX_DEFAULT  line and then running update-grub
  
  Problems with this workaround:
  1) transparent hugepages should work.  This may cause a small performance hit in some situations and a larger hit in others.
  2) If you do this you will probably never know when or if it actually gets fixed.
  
  PS: Lars Müller [ˈlaː(r)z ˈmʏlɐ]
  Samba Team
  SUSE Linux, Maxfeldstraße 5, 90409 Nürnberg, Germany
  
  is looking for bugzilla reports on this too.
+ --- 
+ ApportVersion: 2.19.1-0ubuntu5
+ Architecture: amd64
+ AudioDevicesInUse:
+  USER        PID ACCESS COMMAND
+  /dev/snd/controlC0:  garyrich   2339 F.... pulseaudio
+  /dev/snd/controlC1:  garyrich   2339 F.... pulseaudio
+ CurrentDesktop: Unity
+ DistroRelease: Ubuntu 15.10
+ EcryptfsInUse: Yes
+ HibernationDevice: RESUME=UUID=5b22d42d-33c8-435c-bea0-c1fbba7f88bf
+ InstallationDate: Installed on 2010-01-22 (2149 days ago)
+ InstallationMedia: Ubuntu 9.10 "Karmic Koala" - Release amd64 (20091027)
+ IwConfig:
+  eth0      no wireless extensions.
+  
+  lo        no wireless extensions.
+ MachineType: System manufacturer System Product Name
+ Package: linux (not installed)
+ ProcFB: 0 radeondrmfb
+ ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.2.0-19-generic root=UUID=ab3cda85-2125-40b3-a1d0-f27222cc9ff6 ro quiet splash elevator=cfq vt.handoff=7
+ ProcVersionSignature: Ubuntu 4.2.0-19.23-generic 4.2.6
+ RelatedPackageVersions:
+  linux-restricted-modules-4.2.0-19-generic N/A
+  linux-backports-modules-4.2.0-19-generic  N/A
+  linux-firmware                            1.149.3
+ RfKill:
+  
+ Tags:  wily
+ Uname: Linux 4.2.0-19-generic x86_64
+ UpgradeStatus: No upgrade log present (probably fresh install)
+ UserGroups: adm admin audio cdrom debian-tor dialout dip fax fuse games lpadmin messagebus netdev plugdev sambashare ssh staff syslog tape users video
+ _MarkForUpload: True
+ dmi.bios.date: 04/13/2011
+ dmi.bios.vendor: American Megatrends Inc.
+ dmi.bios.version: 3503
+ dmi.board.asset.tag: To Be Filled By O.E.M.
+ dmi.board.name: M4A78T-E
+ dmi.board.vendor: ASUSTeK Computer INC.
+ dmi.board.version: Rev 1.xx
+ dmi.chassis.asset.tag: Asset-1234567890
+ dmi.chassis.type: 3
+ dmi.chassis.vendor: Chassis Manufacture
+ dmi.chassis.version: Chassis Version
+ dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr3503:bd04/13/2011:svnSystemmanufacturer:pnSystemProductName:pvrSystemVersion:rvnASUSTeKComputerINC.:rnM4A78T-E:rvrRev1.xx:cvnChassisManufacture:ct3:cvrChassisVersion:
+ dmi.product.name: System Product Name
+ dmi.product.version: System Version
+ dmi.sys.vendor: System manufacturer

** Attachment added: "AlsaInfo.txt"
   https://bugs.launchpad.net/bugs/1013807/+attachment/4533433/+files/AlsaInfo.txt

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1013807

Title:
  transparent hugepages and thrashing on amd64

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  I seem to have found a solution to a severe
  thrashing/swapping/freezing problem that I've been having for months
  now.  I guess the real question is - should I turn it into a bug
  report and what would be useful data to include if so.

  This is a quad core AMD Phaeom system with 4G of ram, dual monitors
  and a single 1TB WD caviar black HD.  It had been behaving normally
  until something broke sometime late in the 11.x release cycle and
  continues in the current 12.04 LTS.  The symptoms are running a
  moderate load of apps (firefox with ~8 tabs, a terminal or 2, and
  aisleriot solitaire for example) and experiencing system freezes where
  the entire UI becomes totally unresponsive for 20 seconds - 5 minutes
  with solid disk activity.  Trying to figure out what was going on via
  iotop and top show jbd2 and kswapd accounting for the largest load,
  but since it freezes iotop like everything else I can't tell what's
  going during the worst storms.  Googling around shows a fair number of
  other people with similar problems, most of them with multi core amd64
  systems.

  The other day I spotted this report on opensuse that looked similar
  but not identical:

  http://lists.opensuse.org/opensuse/2012-03/msg00657.html

  I booted with the grub parameter transparent_hugepage=never yesterday
  and the problem went and away and hasn't come back. I've streesed the
  system by running  a bunch of flash/java tabs in firefox, running a
  large java based stock app (ThinkorSwim) in another workspace and
  playing a 1080p 60fps movie in a third workspace.  This certainly
  causes swapping, but not freezing or stumbling.  It actually did a bit
  of swapping a minute ago while I was typing and it managed to make
  Pandora radio stumble for a moment - but that's orders of magnitude
  better than it has been.

  I think there may be a fundamental problem with how transparent
  hugepages are handled with some AMD CPUs.  I think this problem
  started when this feature was implemented and enabled by default. The
  manpage for madvise() says this was added in 2.6.38, but I don't know
  if it was enabled by default at that point.

  Hre's a partial list of things that haven't worked well in the past:

  Playing with the swappiness value: setting swappiness to very low
  values makes the problem take longer to surface, but (unsurprisinglly)
  makes it even worse once it does.

  swapoff-a ; swapon-a: this makes it go away for a while.  A
  potentially interesting thing is that as soon as I can get the system
  to act on the swapoff -a the system becomes responsive again. It pegs
  once CPU core at 100% and the HD grinds like crazy but it stops
  freezing right away.

  Moving swap from the HD to a USB thumb drive: Obviously I didn't
  expect that to be faster but wanted to see if segregating swap to a
  different device on a different bus would make it swap more smoothly -
  it didn't.

  Playing with nice and ionice priorities for jdb2, kswapd.  The fact
  that running these processes at a lower priority than anything else on
  the system makes no difference leads me to think they were just
  symptoms and not at the root of the problem.

  I think this may be a tip of the iceberg and there may be a lot of
  other having this problem.  Looking around I see a fair number of
  reports, most of them unsolved.  Some may have been fixed by just
  adding enough RAM that dirty hugepages just don't collect.  Some may
  have been fixed by chaanging filesystems - ext4 seems like something a
  lot of people with this problem have in common.

  Workaround:
  hold down the spacebar during boot in order to bring up the grub menu, edit the command line and add
  transparent_hugepage=never

  If this fixes the problem you can make it permanent by editing
  /etc/default/grub and adding the ransparent_hugepage=never to the
  GRUB_CMDLINE_LINUX_DEFAULT  line and then running update-grub

  Problems with this workaround:
  1) transparent hugepages should work.  This may cause a small performance hit in some situations and a larger hit in others.
  2) If you do this you will probably never know when or if it actually gets fixed.

  PS: Lars Müller [ˈlaː(r)z ˈmʏlɐ]
  Samba Team
  SUSE Linux, Maxfeldstraße 5, 90409 Nürnberg, Germany

  is looking for bugzilla reports on this too.
  --- 
  ApportVersion: 2.19.1-0ubuntu5
  Architecture: amd64
  AudioDevicesInUse:
   USER        PID ACCESS COMMAND
   /dev/snd/controlC0:  garyrich   2339 F.... pulseaudio
   /dev/snd/controlC1:  garyrich   2339 F.... pulseaudio
  CurrentDesktop: Unity
  DistroRelease: Ubuntu 15.10
  EcryptfsInUse: Yes
  HibernationDevice: RESUME=UUID=5b22d42d-33c8-435c-bea0-c1fbba7f88bf
  InstallationDate: Installed on 2010-01-22 (2149 days ago)
  InstallationMedia: Ubuntu 9.10 "Karmic Koala" - Release amd64 (20091027)
  IwConfig:
   eth0      no wireless extensions.
   
   lo        no wireless extensions.
  MachineType: System manufacturer System Product Name
  Package: linux (not installed)
  ProcFB: 0 radeondrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.2.0-19-generic root=UUID=ab3cda85-2125-40b3-a1d0-f27222cc9ff6 ro quiet splash elevator=cfq vt.handoff=7
  ProcVersionSignature: Ubuntu 4.2.0-19.23-generic 4.2.6
  RelatedPackageVersions:
   linux-restricted-modules-4.2.0-19-generic N/A
   linux-backports-modules-4.2.0-19-generic  N/A
   linux-firmware                            1.149.3
  RfKill:
   
  Tags:  wily
  Uname: Linux 4.2.0-19-generic x86_64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: adm admin audio cdrom debian-tor dialout dip fax fuse games lpadmin messagebus netdev plugdev sambashare ssh staff syslog tape users video
  _MarkForUpload: True
  dmi.bios.date: 04/13/2011
  dmi.bios.vendor: American Megatrends Inc.
  dmi.bios.version: 3503
  dmi.board.asset.tag: To Be Filled By O.E.M.
  dmi.board.name: M4A78T-E
  dmi.board.vendor: ASUSTeK Computer INC.
  dmi.board.version: Rev 1.xx
  dmi.chassis.asset.tag: Asset-1234567890
  dmi.chassis.type: 3
  dmi.chassis.vendor: Chassis Manufacture
  dmi.chassis.version: Chassis Version
  dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr3503:bd04/13/2011:svnSystemmanufacturer:pnSystemProductName:pvrSystemVersion:rvnASUSTeKComputerINC.:rnM4A78T-E:rvrRev1.xx:cvnChassisManufacture:ct3:cvrChassisVersion:
  dmi.product.name: System Product Name
  dmi.product.version: System Version
  dmi.sys.vendor: System manufacturer

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1013807/+subscriptions