kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #111797
[Bug 1432837] Re: HP Proliant Servers - Kernel Panic - NMI - DL360 & DL380 - HPWDT module loaded
This bug was fixed in the package linux - 3.13.0-49.81
---------------
linux (3.13.0-49.81) trusty; urgency=low
[ Kamal Mostafa ]
* Release Tracking Bug
- LP: #1436016
[ Alex Hung ]
* SAUCE: ACPI / blacklist: blacklist Win8 OSI for HP Pavilion dv6
- LP: #1416940
[ Andy Whitcroft ]
* [Packaging] generate live watchdog blacklists
- LP: #1432837
[ Ben Widawsky ]
* SAUCE: i915_bdw: drm/i915/bdw: enable eDRAM.
- LP: #1430855
[ Chris J Arges ]
* [Config] Add ibmvfc to d-i
- LP: #1416001
[ Seth Forshee ]
* [Config] updateconfigs - enable X86_UP_APIC_MSI
[ Upstream Kernel Changes ]
* net: add sysfs helpers for netdev_adjacent logic
- LP: #1410852
* net: Mark functions as static in core/dev.c
- LP: #1410852
* net: rename sysfs symlinks on device name change
- LP: #1410852
* btrfs: fix null pointer dereference in clone_fs_devices when name is
null
- LP: #1429804
* cdc-acm: add sanity checks
- LP: #1413992
* x86: thinkpad_acpi.c: fixed spacing coding style issue
- LP: #1417915
* thinkpad_acpi: support new BIOS version string pattern
- LP: #1417915
* net: sctp: fix slab corruption from use after free on INIT collisions
- LP: #1416506
- CVE-2015-1421
* ipv4: try to cache dst_entries which would cause a redirect
- LP: #1420027
- CVE-2015-1465
* x86, mm/ASLR: Fix stack randomization on 64-bit systems
- LP: #1423757
- CVE-2015-1593
* net: llc: use correct size for sysctl timeout entries
- LP: #1425271
- CVE-2015-2041
* net: rds: use correct size for max unacked packets and bytes
- LP: #1425274
- CVE-2015-2042
* Btrfs: clear compress-force when remounting with compress option
- LP: #1434183
* ext4: merge uninitialized extents
- LP: #1430184
* btrfs: filter invalid arg for btrfs resize
- LP: #1435441
* Bluetooth: Add firmware update for Atheros 0cf3:311f
* Bluetooth: btusb: Add IMC Networks (Broadcom based)
* Bluetooth: sort the list of IDs in the source code
* Bluetooth: append new supported device to the list [0b05:17d0]
* Bluetooth: Add support for Intel bootloader devices
* Bluetooth: Ignore isochronous endpoints for Intel USB bootloader
* Bluetooth: Add support for Acer [13D3:3432]
* Bluetooth: Add support for Broadcom device of Asus Z97-DELUXE
motherboard
* Add a new PID/VID 0227/0930 for AR3012.
* Bluetooth: Add support for Acer [0489:e078]
* Bluetooth: Add USB device 04ca:3010 as Atheros AR3012
* x86: mm: move mmap_sem unlock from mm_fault_error() to caller
* vm: add VM_FAULT_SIGSEGV handling support
* vm: make stack guard page errors return VM_FAULT_SIGSEGV rather than
SIGBUS
* spi/pxa2xx: Clear cur_chip pointer before starting next message
* spi: dw: Fix detecting FIFO depth
* spi: dw-mid: fix FIFO size
* ASoC: wm8960: Fix capture sample rate from 11250 to 11025
* regulator: core: fix race condition in regulator_put()
* ASoC: omap-mcbsp: Correct CBM_CFS dai format configuration
* can: c_can: end pending transmission on network stop (ifdown)
* nfs: fix dio deadlock when O_DIRECT flag is flipped
* NFSv4.1: Fix an Oops in nfs41_walk_client_list
* Input: i8042 - add noloop quirk for Medion Akoya E7225 (MD98857)
* mac80211: properly set CCK flag in radiotap
* nl80211: fix per-station group key get/del and memory leak
* i2c: s3c2410: fix ABBA deadlock by keeping clock prepared
* usb-storage/SCSI: blacklist FUA on JMicron 152d:2566 USB-SATA
controller
* drm/i915: Only fence tiled region of object.
* drm/i915: Fix and clean BDW PCH identification
* drm/i915: BDW Fix Halo PCI IDs marked as ULT.
* ALSA: seq-dummy: remove deadlock-causing events on close
* drivers/rtc/rtc-s5m.c: terminate s5m_rtc_id array with empty element
* drivers: net: cpsw: discard dual emac default vlan configuration
* can: kvaser_usb: Do not sleep in atomic context
* can: kvaser_usb: Send correct context to URB completion
* can: kvaser_usb: Retry the first bulk transfer on -ETIMEDOUT
* can: kvaser_usb: Fix state handling upon BUS_ERROR events
* quota: Switch ->get_dqblk() and ->set_dqblk() to use bytes as space
units
* rbd: fix rbd_dev_parent_get() when parent_overlap == 0
* rbd: drop parent_ref in rbd_dev_unprobe() unconditionally
* dm cache: fix missing ERR_PTR returns and handling
* dm thin: don't allow messages to be sent to a pool target in READ_ONLY
or FAIL mode
* net: cls_bpf: fix size mismatch on filter preparation
* net: cls_bpf: fix auto generation of per list handles
* ipv6: replacing a rt6_info needs to purge possible propagated rt6_infos
too
* perf: Tighten (and fix) the grouping condition
* arc: mm: Fix build failure
* MIPS: IRQ: Fix disable_irq on CPU IRQs
* Complete oplock break jobs before closing file handle
* smpboot: Add missing get_online_cpus() in
smpboot_register_percpu_thread()
* ASoC: atmel_ssc_dai: fix start event for I2S mode
* spi: fsl-dspi: Fix memory leak
* spi: spi-fsl-dspi: Remove usage of devm_kzalloc
* ALSA: ak411x: Fix stall in work callback
* lib/checksum.c: fix carry in csum_tcpudp_nofold
* MIPS: Fix kernel lockup or crash after CPU offline/online
* gpio: sysfs: fix memory leak in gpiod_export_link
* gpio: sysfs: fix memory leak in gpiod_sysfs_set_active_low
* PCI: Add NEC variants to Stratus ftServer PCIe DMI check
* ASoC: sgtl5000: add delay before first I2C access
* PCI: Handle read-only BARs on AMD CS553x devices
* mm: pagewalk: call pte_hole() for VM_PFNMAP during walk_page_range
* nilfs2: fix deadlock of segment constructor over I_SYNC flag
* tcp: ipv4: initialize unicast_sock sk_pacing_rate
* caif: remove wrong dev_net_set() call
* qlge: Fix qlge_update_hw_vlan_features to handle if interface is down
* ip6_gre: fix endianness errors in ip6gre_err
* spi: dw: revisit FIFO size detection again
* Linux 3.13.11-ckt17
-- Kamal Mostafa <kamal@xxxxxxxxxxxxx> Tue, 24 Mar 2015 11:58:44 -0700
** Changed in: linux (Ubuntu Trusty)
Status: Fix Committed => Fix Released
** CVE added: http://www.cve.mitre.org/cgi-
bin/cvename.cgi?name=2015-1421
** CVE added: http://www.cve.mitre.org/cgi-
bin/cvename.cgi?name=2015-1465
** CVE added: http://www.cve.mitre.org/cgi-
bin/cvename.cgi?name=2015-1593
** CVE added: http://www.cve.mitre.org/cgi-
bin/cvename.cgi?name=2015-2041
** CVE added: http://www.cve.mitre.org/cgi-
bin/cvename.cgi?name=2015-2042
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1432837
Title:
HP Proliant Servers - Kernel Panic - NMI - DL360 & DL380 - HPWDT
module loaded
Status in linux package in Ubuntu:
Fix Released
Status in linux source package in Precise:
Fix Committed
Status in linux source package in Trusty:
Fix Released
Status in linux source package in Utopic:
Fix Committed
Bug description:
It was brought to me several situations where users where facing
kernel panics when machine was apparently idling (for some HP Proliant
Servers like DL 360, DL 380).
ILO:
"76 CriticalSystem Error03/12/2015 12:4203/12/2015 12:072 An
Unrecoverable System Error (NMI) has occurred (System error code
0x0000002B, 0x00000000)"
Examples:
PID: 0 TASK: ffffffff81c1a480 CPU: 0 COMMAND: "swapper/0"
#0 [ffff88085fc05c88] machine_kexec at ffffffff8104eac2
#1 [ffff88085fc05cd8] crash_kexec at ffffffff810f26a3
#2 [ffff88085fc05da0] panic at ffffffff8175b3f2
#3 [ffff88085fc05e20] sched_clock at ffffffff8101c3b9
#4 [ffff88085fc05e30] nmi_handle at ffffffff810170e8
#5 [ffff88085fc05e90] io_check_error at ffffffff8101758e
#6 [ffff88085fc05eb0] default_do_nmi at ffffffff810176a9
#7 [ffff88085fc05ed8] do_nmi at ffffffff810177d8
#8 [ffff88085fc05ef0] end_repeat_nmi at ffffffff8176da21
[exception RIP: native_safe_halt+6]
RIP: ffffffff81055186 RSP: ffffffff81c03e90 RFLAGS: 00000246
RAX: 0000000000000010 RBX: 0000000000000010 RCX: 0000000000000246
RDX: ffffffff81c03e90 RSI: 0000000000000018 RDI: 0000000000000001
RBP: ffffffff81055186 R8: ffffffff81055186 R9: 0000000000000018
R10: ffffffff81c03e90 R11: 0000000000000246 R12: ffffffffffffffff
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
ORIG_RAX: 0000000000000000 CS: 0010 SS: 0018
--- <DOUBLEFAULT exception stack> ---
#9 [ffffffff81c03e90] native_safe_halt at ffffffff81055186
#10 [ffffffff81c03e98] default_idle at ffffffff8101d37f
#11 [ffffffff81c03eb8] arch_cpu_idle at ffffffff8101dcaf
#12 [ffffffff81c03ec8] cpu_startup_entry at ffffffff810b5325
#13 [ffffffff81c03f40] rest_init at ffffffff81751a37
#14 [ffffffff81c03f50] start_kernel at ffffffff81d320b7
#15 [ffffffff81c03f90] x86_64_start_reservations at ffffffff81d315ee
#16 [ffffffff81c03fa0] x86_64_start_kernel at ffffffff81d31733
OR
PID: 0 TASK: ffffffff81c14440 CPU: 0 COMMAND: "swapper/0"
#0 [ffff880fffa07c40] machine_kexec at ffffffff8104b391
#1 [ffff880fffa07cb0] crash_kexec at ffffffff810d5fb8
#2 [ffff880fffa07d80] panic at ffffffff81730335
#3 [ffff880fffa07e00] hpwdt_pretimeout at ffffffffa02378b5 [hpwdt]
#4 [ffff880fffa07e20] nmi_handle at ffffffff8174a76a
#5 [ffff880fffa07ea0] default_do_nmi at ffffffff8174aacd
#6 [ffff880fffa07ed0] do_nmi at ffffffff8174abe0
#7 [ffff880fffa07ef0] end_repeat_nmi at ffffffff81749c81
[exception RIP: intel_idle+204]
RIP: ffffffff813f07ec RSP: ffffffff81c01d88 RFLAGS: 00000046
RAX: 0000000000000010 RBX: 0000000000000010 RCX: 0000000000000046
RDX: ffffffff81c01d88 RSI: 0000000000000018 RDI: 0000000000000001
RBP: ffffffff813f07ec R8: ffffffff813f07ec R9: 0000000000000018
R10: ffffffff81c01d88 R11: 0000000000000046 R12: ffffffffffffffff
R13: 0000000001c0d000 R14: ffffffff81c01fd8 R15: 0000000000000000
ORIG_RAX: 0000000000000000 CS: 0010 SS: 0018
--- <NMI exception stack> ---
#8 [ffffffff81c01d88] intel_idle at ffffffff813f07ec
#9 [ffffffff81c01dc0] cpuidle_enter_state at ffffffff815e76cf
It turned out that after investigating all idling situations and
diverse kernel dump files - where we had most of the CPUs either
MWAITing and or "relaxing", we discovered that HPWDT was loaded and
corosync was opening /dev/watchdog file, triggering the ILO watchdog
timer and not updating frequently enough as ILO expected.
As described in /etc/modprobe.d/blacklist-watchdog.conf:
"""
# Watchdog drivers should not be loaded automatically, but only if a
# watchdog daemon is installed.
"""
We should blacklist module "hpwdt" by default for all Ubuntu versions.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1432837/+subscriptions
References