kernel-packages team mailing list archive

Thread
Date
[Bug 1154876] Re: 3.2.0-38 and earlier systems hang with heavy memory usage

To: kernel-packages@xxxxxxxxxxxxxxxxxxx
From: Marc Hasson <mhassonsuspect@xxxxxxxxx>
Date: Wed, 16 Oct 2013 18:46:15 -0000
Reply-to: Bug 1154876 <1154876@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
Summary:

Tried 3.11rc7, very happy with how it behaved in our testing.  Tried
this week's 3.12rc5, disappointed that a "step backwards" was taken
on that one for us.  The difference for us was in the "low memory killer"
that was configured in the 3.11rc7 build but not the 3.12rc5 system.
Details below, as a consequence I'm tagging this bug with both "upstream
3.11rc7 fixes" as well as "upstream 3.12rc5 doesn't fix"!


Details:

I've now switched to a real hardware (Dell multicore) platform to make
sure no one has any doubts as to this kernel problem being an issue on
real hardware as well as my VM testbed.  I can achieve the same hang
failure in the original bug description using either my 2GB VM or the
actual machine now.

I first reproduced the hang with a more recent 3.2.0-45 kernel on this
64-bit Dell hardware and then tried both the mainline 3.11rc7 and this
week's 3.12rc5 kernels from the URL supplied above by Christopher.

The good news is that I was unable to reproduce a problem using the
3.11rc7 kernel and the system was extremely well-behaved!  That is,
despite running a very heavy load it remained responsive to new requests,
appeared to get more overall work accomplished compared to the 3.2 system
in the same time period, and had a minimum of kswapd scan rates in the
"sar" records.  And no direct allocation failure scan rates at all.
Naturally, the system was SIGKILL'ing off selected processes periodically
but this is the price I'd expect for running the memory-overloading
test I have here and in my real-world environment.  We much prefer
this behavior of individual processes being killed off, which can be
subsequently relaunched, rather than hanging or crashing the entire
system.  Especially since it appeared that the SIGKILLs in my tests
were *always* directed at processes that were actively doing the memory
consuming work, so they were good choices.

I note that the processes SKIGKILL'ed off in the above 3.11rc7 system
were dispatched to their death by the "low memory killer" logic in the
lowmemorykiller.c code.  The standard kernel OOM killer rarely, if ever,
was invoked.  The 3.11rc7 kernel appears to have been built with the
CONFIG_ANDROID_LOW_MEMORY_KILLER=y setting which caused that low memory
killer code to be statically linked into the kernel and register its low
memory shrinker callback function which issued the appropriate SIGKILLs
under overloaded conditions.

The bad news is that the more recent 3.12rc5 kernel I tried did NOT
have the above CONFIG_ANDROID_LOW_MEMORY_KILLER=y setting and instead
relied upon just the kernel OOM killer.  This 3.12rc5 system is behaving
similarly to when I turned off the 3.11rc7's "low memory killer" via
a /sys/module low memory minfree parameter.  That is, the 3.12rc5 (or
3.11rc7 with "low memory killer" disabled) system experienced:

 1) Much longer, and with wide variance, user response times
    External wget queries went from 1-5 seconds with the "low memory
    killer" enabled during the overloading tests to 2 *minutes* without
    that facility!

 2) High kswapd scans of .5M-1M/second in the "sar" reports
    With the low memory killer, kswapd scan rates never exceeded a few K/sec.

 3) Fairly high direct allocation failure scans as well (K/sec)

 4) Multiple processes critical to system functions were OOM'ed
    Management shell/terminal sessions that were idle, sshd, cron, etc.

 5) Even a panic in one test sequence
    "Kernel panic - not syncing: Out of memory and no killable processes..."

The behavior of our test systems without the low memory killer
functionality is poor, with the system either crashing or providing
a poor (simulated) customer response.  Either is better than the 3.2
"hang" I've reported, but not by much for our production/response needs!

I understand that there are concerns about the "low memory killer"
killing off processes before even getting to use the allocated
swap space on a system.  I observed that as well, which for us was
fine.   But I appreciate that it may not be desirable to have the
"CONFIG_ANDROID_LOW_MEMORY_KILLER=y" option for all folks' usage cases
as was done for the 3.11rc7 build.  But what about supplying that "low
memory killer" as an optionally loadable module by simply building with
"CONFIG_ANDROID_LOW_MEMORY_KILLER=m" in the kernel/distribution package?
That way, those of us who desire to not use any swap area and prefer a
more responsive system overall will have a simple way to load that module
distributed with the then-current Ubuntu kernel.  There are usage cases
where its better to shed load by killing off processes earlier rather than
degrade response time by using the swap area to preserve those processes.
The default would be to retain the current 3.12rc5 behavior: do NOT load
the low memory killer and in so doing experience the standard kernel OOM
handling.  The later could be improved over time as a separate effort,
if needed.

We would consider the above minor loadable module configuration change as
a simple way to resolve this memory overloading issue to our satisfaction.
I look forward to hearing whether this can be done for some supported
version of an LTS precise kernel, such as via a backport of an LTS 3.12
kernel perhaps.


** Tags added: kernel-bug-exists-upstream kernel-bug-exists-upstream-v3.12-rc5 kernel-fixed-upstream kernel-fixed-upstream-v3.11-rc7

** Changed in: linux (Ubuntu)
       Status: Incomplete => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1154876

Title:
  3.2.0-38 and earlier systems hang with heavy memory usage

Status in “linux” package in Ubuntu:
  Confirmed

Bug description:
  Background

  We've been experiencing mysterious hangs on our 2.6.38-16 Ubuntu 10.04
  systems in the field.  The systems have large amounts of memory and disk,
  along with up to a couple dozen CPU threads.  Our operations folks have
  to power-cycle the machines to recover them, they do not panic.  Our use
  of "hang" means the system will no longer respond to any current shell
  prompts, will not accept new logins, and may not even respond to pings.
  It appears totally dead.

  Using log files and the "sar" utility from the "sysstat" package we
  gradually put together the following clues to the hangs:

    Numerous "INFO: task <task-name>:<pid> blocked for more than 120 seconds"
    High CPU usage suddenly on all CPUs heading into the hang, 92% or higher
    Very high kswapd page scan rates (pgscank/s) - up to 7 million per second
    Very high direct page scan rates (pgscand/s) - up to 3 million per second

  In addition to noting the above events just before the hangs, we have
  some evidence that the high kswapd scans occur at other times for no
  seemingly obvious reason.  Such as when there is a signficant (25%) amount
  of kbmemfree.  Also, we've seen cases where there are application errors
  related to a system's responsiveness and that has sometimes correlated
  with either high pgscank/s or pgscand/s that lasts for some number of
  sar records before the system returns to normal running.  The peaks of
  these transients aren't usually as high as those we see leading to a
  solid system hang/failure.  And sometimes these are not "transients",
  but last for hours with no apparent event related to the starting or
  stopping of this behavior!

  So we decided to see if we can reproduce these symptoms on a VMware
  testbed that we could easily examine with kdb and snapshot/dump.
  Through a combination of tar, find, and cat commands launched from
  a shell script we could recreate a system hang on both our 2.6.38-16
  systems as well as the various flavors of the 3.2 kernels, with the
  one crashdump'ed here being the latest 3.2.0-38 at the time of testing.
  The "sar" utility on our 2.6 testing confirmed similar behavior of the
  CPUs, kswapd scans, and direct scans leading up to the testbed hangs as
  to what we see in the field failures of our servers.

  Details on the shell scripts can be found in the file referenced below.
  Its important to read the information below on how the crash dump was
  taken before investigating it.  Reproduction on a 2-CPU VM took 1.5-4
  days for a 3.2 kernel, usually considerably less for a 2.6 kernel.

  Hang/crashdump details:

  In the crashdump the crash "dmesg" command will also show Call Traces that
  occured *after* kdb investigations started.  Its important to note the
  kernel timestamp that indicates the start of those kdb actions and only
  examine prior to that for clues as to the hang proper:

  [160512.756748] SysRq : DEBUG
  [164936.052464] psmouse serio1: Wheel Mouse at isa0060/serio1/input0 lost synchronization, throwing 2 bytes away.
  [164943.764441] psmouse serio1: resync failed, issuing reconnect request
  [165296.817468] SysRq : DEBUG

  Everything previous to the above "dmesg" output occurs prior (or during)
  the full system hang.  The kdb session started over 12 hours after the
  hang, the system was totally non-responsive at either its serial console
  or GUI.  Did not try a "ping" in this instance.

  The "kdb actions" taken may be seen in an actual log of that session
  recorded in console_kdb_session.txt.  It shows where these 3.2 kernels
  are spending their time when hung in our testbed ("spinning" in
  __alloc_pages_slowpath by failing an allocation, sleeping, retrying).
  We see the same behavior for the 2.6 kernels/tests as well except for
  one difference described below.  For the 3.2 dump included here all our
  script/load processes, as well as system processes, are constantly failing
  to allocate a page, sleeping briefly, and trying again.  This occurs
  across all CPUs (2 CPUs in this system/dump), which fits with what we
  believe we see in our field machines for the 2.6 kernels.

  For the 2.6 kernels the only difference we see is that there is typically
  a call to the __alloc_pages_may_oom function which in turn selects a
  process to kill, but we see that there is already a "being killed by oom"
  process at the hang so no additional ones are selected.  And we deadlock,
  just as the comment in oom_kill.c's select_bad_process() says.  In the
  3.2 kernels we are now moving our systems to we see in our testbed hang
  that the code does not go down the __alloc_pages_may_oom path.  Yet from
  the logs we include and the "dmesg" within crash one can see that prior
  to the hang OOM killing is invoked frequently.  The key seems to be a
  difference in the "did_some_progress" variable returned when we are very
  low on memory, its always a "1" in the 3.2 kernels on our testbed.

  Though the kernel used here is 3.2.0-38-generic we have also caused this
  to occur with earlier 3.2 Ubuntu kernels.  We have also reproduced the
  failures with 2.6.38-8, 2.6.38-16, and 3.0 Ubuntu kernels.

  Quick description of included attachments (assuming this bug tool lets me add them separately):
  console_boot_output.txt - boot up messages until standard running state of OOMs
  dmesg_of_boot.txt - dmesg file from boot, mostly duplicates start of the above
  console_last_output.txt - last messages on serial console when system hung
  console_kdb_session.txt - kdb session demo'ing where system is "spinning"
  dump.201303072055 - sysrq-g dump, system was up around 2 days before hanging
  reproduction_info.txt - Machine environment and script used in our testbed

  ProblemType: Bug
  DistroRelease: Ubuntu 12.04
  Package: linux-image-3.2.0-38-generic 3.2.0-38.61
  ProcVersionSignature: Ubuntu 3.2.0-38.61-generic 3.2.37
  Uname: Linux 3.2.0-38-generic x86_64
  AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.24.
  ApportVersion: 2.0.1-0ubuntu17.1
  Architecture: amd64
  ArecordDevices:
   **** List of CAPTURE Hardware Devices ****
   card 0: AudioPCI [Ensoniq AudioPCI], device 0: ES1371/1 [ES1371 DAC2/ADC]
     Subdevices: 1/1
     Subdevice #0: subdevice #0
  AudioDevicesInUse:
   USER        PID ACCESS COMMAND
   /dev/snd/controlC0:  marc       2591 F.... pulseaudio
  CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found.
  Card0.Amixer.info:
   Card hw:0 'AudioPCI'/'Ensoniq AudioPCI ENS1371 at 0x20c0, irq 18'
     Mixer name	: 'Cirrus Logic CS4297A rev 3'
     Components	: 'AC97a:43525913'
     Controls      : 24
     Simple ctrls  : 13
  Date: Wed Mar 13 17:05:30 2013
  HibernationDevice: RESUME=UUID=2342cd45-2970-47d7-bb6d-6801d361cb3e
  InstallationMedia: Ubuntu 12.04 LTS "Precise Pangolin" - Release amd64 (20120425)
  Lsusb:
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
   Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
   Bus 002 Device 002: ID 0e0f:0003 VMware, Inc. Virtual Mouse
   Bus 002 Device 003: ID 0e0f:0002 VMware, Inc. Virtual USB Hub
  MachineType: VMware, Inc. VMware Virtual Platform
  MarkForUpload: True
  ProcEnviron:
   TERM=xterm
   PATH=(custom, no user)
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 svgadrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.2.0-38-generic root=UUID=2db72c58-0ff6-48f6-87e4-55365ee344df ro crashkernel=384M-2G:64M,2G-:128M rootdelay=60 console=ttyS1,115200n8 kgdboc=kms,kbd,ttyS1,115200n8 splash
  RelatedPackageVersions:
   linux-restricted-modules-3.2.0-38-generic N/A
   linux-backports-modules-3.2.0-38-generic  N/A
   linux-firmware                            1.79.1
  RfKill:

  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 06/02/2011
  dmi.bios.vendor: Phoenix Technologies LTD
  dmi.bios.version: 6.00
  dmi.board.name: 440BX Desktop Reference Platform
  dmi.board.vendor: Intel Corporation
  dmi.board.version: None
  dmi.chassis.asset.tag: No Asset Tag
  dmi.chassis.type: 1
  dmi.chassis.vendor: No Enclosure
  dmi.chassis.version: N/A
  dmi.modalias: dmi:bvnPhoenixTechnologiesLTD:bvr6.00:bd06/02/2011:svnVMware,Inc.:pnVMwareVirtualPlatform:pvrNone:rvnIntelCorporation:rn440BXDesktopReferencePlatform:rvrNone:cvnNoEnclosure:ct1:cvrN/A:
  dmi.product.name: VMware Virtual Platform
  dmi.product.version: None
  dmi.sys.vendor: VMware, Inc.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1154876/+subscriptions