kernel-packages team mailing list archive

Thread
Date
[Bug 1416414] Re: Trusty + Intel E5-26xx + NMI handler (perf_event_nmi_handler) took too long to run

To: kernel-packages@xxxxxxxxxxxxxxxxxxx
From: Rafael David Tinoco <inaddy@xxxxxxxxxx>
Date: Fri, 30 Jan 2015 13:53:32 -0000
Reply-to: Bug 1416414 <1416414@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
After analyzing the incomplete dump we got, for this particular case,
and analyzing kernel code changes and Intel firmware erratas, and
talking with HP ROM engineers (providing them errata also) we believe
that, for this stack trace, we have triggered the following microcode
problem:

###

http://www.intel.com.br/content/dam/www/public/us/en/documents
/specification-updates/xeon-e7-v2-spec-update.pdf (Intel® Xeon®
Processor E7 v2 Product Family Specification Update January 2015)

CF140 Performance Monitoring IA32_PERF_GLOBAL_STATUS.CondChgd Bit Not
Cleared by Reset

Problem: The IA32_PERF_GLOBAL_STATUS MSR (38EH) should be cleared by
reset. Due to this erratum, CondChgd (bit 63) of the
IA32_PERF_GLOBAL_STATUS MSR may not be cleared.

Implication: When this erratum occurs, performance monitoring software may behave unexpectedly.
Workaround: It is possible for the BIOS to contain a workaround for this erratum. --> HP is probably working on this.

###

*believe because we can't check the PMU registers from the core dump we
got, but everything points in that direction

This means that in x86 Linux the NMI (Non Maskable Interrupts) watchdog
(hard-lockup_detector) uses PMU (Performance Counters) registers to
signal who was responsible to generate the NMI.

Obs: Our intention when talking to HP was to make sure their power
management firmware was not touching those registers (and they said they
only read registers and there is no such thing as a "clear" after read
when reads are made by firmware).

The NMI handler (kernel function responsible to handle NMIs) identifies
who was responsible for the NMI by looking into PMU registers. Intel
microcode does not clear BIT 63 (CondChgd) when the CPU is reset and it
makes the NMI handler to misbehave (trying to handle NMIs that should
not be handled by this particular kernel code).

This was seen recently by a kernel developer in the following commit:

commit b292d7a10487aee6e74b1c18b8d95b92f40d4a4f

And in Intel errata document (above).

This following commit is applied in Trusty kernel from version 3.13.0-35
up to the latest one:

inaddy@workstation:/kernel/ubuntu-trusty$ git tag --contains=ffb4bbaa2bf1ad9d79cf4d62d625499a7271f88e
Ubuntu-3.13.0-35.61
...
Ubuntu-3.13.0-45.74

User was using kernel 3.13.0-34 and it does not contain such fix.

STEP 1) To upgrade all HP Proliant Servers to latest Ubuntu Trusty
kernel version.

STEP 2)

Together with HP we concluded that, for now, the best for the HP
Proliant Servers is to have the following cmdline:

" ... intremap=no_x2apic_optout intel_idle.max_cstate=0 nmi_watchdog=0
..."

intremap=no_x2apic_optout -> tells the OS that despite firmware asking
for the kernel to opt out in using x2apic... it can use (Gen8 and beyond
support that feature and have the advantages from x2apic (over xapic)
such as supporting more CPUs and IRQ remapping).

intel_idle.max_cstate=0 -> tells the OS to disable intel_idle module and
activate acpi_idle module. (HP uses ACPI heavily for their firmware
power management features and intel_idle might put CPUs in a deeper
state than the firmware would like it to be, causing bigger latencies
and NMIs)

nmi_watchdog=0 -> tells the OS to use HP watchdog driver (due to the
nature of this problem, being intermittent, HP feels like your systems
should be more stable with this option. They don't recommend the usage
of this option for all setups, only those with similar workload of this
which suffered from NMI.

This should solve NMIs problems we've seen so far for these servers.

PS: We are still working together with HP on providing feedback
regarding NMIs and their firmware behavior.


** Tags added: cts

** Changed in: linux (Ubuntu)
       Status: Fix Released => In Progress

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1416414

Title:
  Trusty + Intel E5-26xx + NMI handler (perf_event_nmi_handler) took too
  long to run

Status in linux package in Ubuntu:
  In Progress

Bug description:
  It was brought to my attention the following case:

  Hardware name: HP ProLiant DL380p Gen8, BIOS P70 12/20/2013 
  Kernel: 3.13.0-34 

  Stack trace:

  2189823.168958] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 882.406 msecs 
  [2189823.168974] Kernel panic - not syncing: An NMI occurred, please see the Integrated Management Log for details. 

  [2189823.184283] CPU: 0 PID: 60396 Comm: ceph-osd Not tainted 3.13.0-34-generic #60-Ubuntu 
  [2189823.194371] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 12/20/2013 
  [2189823.202794] 0007c7a1f01b74a3 ffff88081fa06dd0 ffffffff8171bd94 ffffffffa01672d8 
  [2189823.212421] ffff88081fa06e48 ffffffff81714f95 0000000000000008 ffff88081fa06e58 
  [2189823.221889] ffff88081fa06df8 ffffffff81c1c4c0 ffffc90006278072 0000000000000001 
  [2189823.231361] Call Trace: 
  [2189823.234597] <NMI> [<ffffffff8171bd94>] dump_stack+0x45/0x56 
  [2189823.241996] [<ffffffff81714f95>] panic+0xc8/0x1d7 
  [2189823.248152] [<ffffffffa01668fd>] hpwdt_pretimeout+0xdd/0xdd [hpwdt] 
  [2189823.256251] [<ffffffff8101b7e9>] ? sched_clock+0x9/0x10 
  [2189823.263054] [<ffffffff81725448>] nmi_handle.isra.3+0x88/0x180 
  [2189823.270500] [<ffffffff817256fd>] do_nmi+0x1bd/0x340 
  [2189823.276867] [<ffffffff817248b1>] end_repeat_nmi+0x1e/0x2e 
  [2189823.283888] [<ffffffff810d7bf0>] ? futex_wait_queue_me+0x140/0x140 
  [2189823.291874] [<ffffffff810d7bf0>] ? futex_wait_queue_me+0x140/0x140 
  [2189823.299966] [<ffffffff810d7bf0>] ? futex_wait_queue_me+0x140/0x140

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1416414/+subscriptions
References

[Bug 1416414] [NEW] Trusty + Intel E5-26xx + NMI handler (perf_event_nmi_handler) took too long to run
From: Rafael David Tinoco, 2015-01-30