kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #109508
[Bug 1432837] Re: HP Proliant Servers - Kernel Panic - NMI - DL360 & DL380 - HPWDT module loaded
This bug was fixed in the package linux - 3.19.0-10.10
---------------
linux (3.19.0-10.10) vivid; urgency=low
[ Andy Whitcroft ]
* [Packaging] control -- make element ordering deterministic
* [Config] allow dracult to support initramfs as well
- LP: #1109029
* [Packaging] generate live watchdog blacklists
- LP: #1432837
[ Leann Ogasawara ]
* [Config] CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT=y
- LP: #1397860
* rebase to v3.19.2
[ Upstream Kernel Changes ]
* thinkpad_acpi: support new BIOS version string pattern
- LP: #1417915
* arm64: Invalidate the TLB corresponding to intermediate page table
levels
- LP: #1432546
* perf tools: Support parsing parameterized events
- LP: #1430341
* perf tools: Extend format_alias() to include event parameters
- LP: #1430341
* perf Documentation: Add event parameters
- LP: #1430341
* perf tools: Document parameterized and symbolic events
- LP: #1430341
* perf: provide sysfs_show for struct perf_pmu_events_attr
- LP: #1430341
* perf: add PMU_EVENT_ATTR_STRING() helper
- LP: #1430341
* perf: define EVENT_DEFINE_RANGE_FORMAT_LITE helper
- LP: #1430341
* powerpc/perf/hv-24x7: parse catalog and populate sysfs with events
- LP: #1430341
* powerpc/perf/{hv-gpci, hv-common}: generate requests with counters
annotated
- LP: #1430341
* powerpc/perf/hv-gpci: add the remaining gpci requests
- LP: #1430341
* powerpc/perf/hv-24x7: Document sysfs event description entries
- LP: #1430341
* powerpc/iommu: Remove IOMMU device references via bus notifier
- LP: #1425202
* powerpc/pseries: Fix endian problems with LE migration
- LP: #1428351
* intel_idle: support additional Broadwell model
- LP: #1400970
* tools/power turbostat: support additional Broadwell model
- LP: #1400970
* KVM: x86: flush TLB when D bit is manually changed.
- LP: #1397860
* Optimize TLB flush in kvm_mmu_slot_remove_write_access.
- LP: #1397860
* KVM: Add generic support for dirty page logging
- LP: #1397860
* KVM: x86: switch to kvm_get_dirty_log_protect
- LP: #1397860
* KVM: Rename kvm_arch_mmu_write_protect_pt_masked to be more generic for
log dirty
- LP: #1397860
* KVM: MMU: Add mmu help functions to support PML
- LP: #1397860
* KVM: MMU: Explicitly set D-bit for writable spte.
- LP: #1397860
* KVM: x86: Change parameter of kvm_mmu_slot_remove_write_access
- LP: #1397860
* KVM: x86: Add new dirty logging kvm_x86_ops for PML
- LP: #1397860
* KVM: VMX: Add PML support in VMX
- LP: #1397860
* HID: multitouch: add support of clickpads
* HID: multitouch: Add support for button type usage
[ Upstream Kernel Changes ]
* rebase to v3.19.2
- LP: #1428947
-- Andy Whitcroft <apw@xxxxxxxxxxxxx> Mon, 23 Mar 2015 15:28:16 +0000
** Changed in: linux (Ubuntu)
Status: Fix Committed => Fix Released
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1432837
Title:
HP Proliant Servers - Kernel Panic - NMI - DL360 & DL380 - HPWDT
module loaded
Status in linux package in Ubuntu:
Fix Released
Status in linux source package in Precise:
Fix Committed
Status in linux source package in Trusty:
Fix Committed
Status in linux source package in Utopic:
Fix Committed
Bug description:
It was brought to me several situations where users where facing
kernel panics when machine was apparently idling (for some HP Proliant
Servers like DL 360, DL 380).
ILO:
"76 CriticalSystem Error03/12/2015 12:4203/12/2015 12:072 An
Unrecoverable System Error (NMI) has occurred (System error code
0x0000002B, 0x00000000)"
Examples:
PID: 0 TASK: ffffffff81c1a480 CPU: 0 COMMAND: "swapper/0"
#0 [ffff88085fc05c88] machine_kexec at ffffffff8104eac2
#1 [ffff88085fc05cd8] crash_kexec at ffffffff810f26a3
#2 [ffff88085fc05da0] panic at ffffffff8175b3f2
#3 [ffff88085fc05e20] sched_clock at ffffffff8101c3b9
#4 [ffff88085fc05e30] nmi_handle at ffffffff810170e8
#5 [ffff88085fc05e90] io_check_error at ffffffff8101758e
#6 [ffff88085fc05eb0] default_do_nmi at ffffffff810176a9
#7 [ffff88085fc05ed8] do_nmi at ffffffff810177d8
#8 [ffff88085fc05ef0] end_repeat_nmi at ffffffff8176da21
[exception RIP: native_safe_halt+6]
RIP: ffffffff81055186 RSP: ffffffff81c03e90 RFLAGS: 00000246
RAX: 0000000000000010 RBX: 0000000000000010 RCX: 0000000000000246
RDX: ffffffff81c03e90 RSI: 0000000000000018 RDI: 0000000000000001
RBP: ffffffff81055186 R8: ffffffff81055186 R9: 0000000000000018
R10: ffffffff81c03e90 R11: 0000000000000246 R12: ffffffffffffffff
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
ORIG_RAX: 0000000000000000 CS: 0010 SS: 0018
--- <DOUBLEFAULT exception stack> ---
#9 [ffffffff81c03e90] native_safe_halt at ffffffff81055186
#10 [ffffffff81c03e98] default_idle at ffffffff8101d37f
#11 [ffffffff81c03eb8] arch_cpu_idle at ffffffff8101dcaf
#12 [ffffffff81c03ec8] cpu_startup_entry at ffffffff810b5325
#13 [ffffffff81c03f40] rest_init at ffffffff81751a37
#14 [ffffffff81c03f50] start_kernel at ffffffff81d320b7
#15 [ffffffff81c03f90] x86_64_start_reservations at ffffffff81d315ee
#16 [ffffffff81c03fa0] x86_64_start_kernel at ffffffff81d31733
OR
PID: 0 TASK: ffffffff81c14440 CPU: 0 COMMAND: "swapper/0"
#0 [ffff880fffa07c40] machine_kexec at ffffffff8104b391
#1 [ffff880fffa07cb0] crash_kexec at ffffffff810d5fb8
#2 [ffff880fffa07d80] panic at ffffffff81730335
#3 [ffff880fffa07e00] hpwdt_pretimeout at ffffffffa02378b5 [hpwdt]
#4 [ffff880fffa07e20] nmi_handle at ffffffff8174a76a
#5 [ffff880fffa07ea0] default_do_nmi at ffffffff8174aacd
#6 [ffff880fffa07ed0] do_nmi at ffffffff8174abe0
#7 [ffff880fffa07ef0] end_repeat_nmi at ffffffff81749c81
[exception RIP: intel_idle+204]
RIP: ffffffff813f07ec RSP: ffffffff81c01d88 RFLAGS: 00000046
RAX: 0000000000000010 RBX: 0000000000000010 RCX: 0000000000000046
RDX: ffffffff81c01d88 RSI: 0000000000000018 RDI: 0000000000000001
RBP: ffffffff813f07ec R8: ffffffff813f07ec R9: 0000000000000018
R10: ffffffff81c01d88 R11: 0000000000000046 R12: ffffffffffffffff
R13: 0000000001c0d000 R14: ffffffff81c01fd8 R15: 0000000000000000
ORIG_RAX: 0000000000000000 CS: 0010 SS: 0018
--- <NMI exception stack> ---
#8 [ffffffff81c01d88] intel_idle at ffffffff813f07ec
#9 [ffffffff81c01dc0] cpuidle_enter_state at ffffffff815e76cf
It turned out that after investigating all idling situations and
diverse kernel dump files - where we had most of the CPUs either
MWAITing and or "relaxing", we discovered that HPWDT was loaded and
corosync was opening /dev/watchdog file, triggering the ILO watchdog
timer and not updating frequently enough as ILO expected.
As described in /etc/modprobe.d/blacklist-watchdog.conf:
"""
# Watchdog drivers should not be loaded automatically, but only if a
# watchdog daemon is installed.
"""
We should blacklist module "hpwdt" by default for all Ubuntu versions.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1432837/+subscriptions
References