← Back to team overview

kernel-packages team mailing list archive

[Bug 1432837] Re: HP Proliant Servers - Kernel Panic - NMI - DL360 & DL380 - HPWDT module loaded

 

This bug was fixed in the package linux - 3.13.0-49.81

---------------
linux (3.13.0-49.81) trusty; urgency=low

  [ Kamal Mostafa ]

  * Release Tracking Bug
    - LP: #1436016

  [ Alex Hung ]

  * SAUCE: ACPI / blacklist: blacklist Win8 OSI for HP Pavilion dv6
    - LP: #1416940

  [ Andy Whitcroft ]

  * [Packaging] generate live watchdog blacklists
    - LP: #1432837

  [ Ben Widawsky ]

  * SAUCE: i915_bdw: drm/i915/bdw: enable eDRAM.
    - LP: #1430855

  [ Chris J Arges ]

  * [Config] Add ibmvfc to d-i
    - LP: #1416001

  [ Seth Forshee ]

  * [Config] updateconfigs - enable X86_UP_APIC_MSI

  [ Upstream Kernel Changes ]

  * net: add sysfs helpers for netdev_adjacent logic
    - LP: #1410852
  * net: Mark functions as static in core/dev.c
    - LP: #1410852
  * net: rename sysfs symlinks on device name change
    - LP: #1410852
  * btrfs: fix null pointer dereference in clone_fs_devices when name is
    null
    - LP: #1429804
  * cdc-acm: add sanity checks
    - LP: #1413992
  * x86: thinkpad_acpi.c: fixed spacing coding style issue
    - LP: #1417915
  * thinkpad_acpi: support new BIOS version string pattern
    - LP: #1417915
  * net: sctp: fix slab corruption from use after free on INIT collisions
    - LP: #1416506
    - CVE-2015-1421
  * ipv4: try to cache dst_entries which would cause a redirect
    - LP: #1420027
    - CVE-2015-1465
  * x86, mm/ASLR: Fix stack randomization on 64-bit systems
    - LP: #1423757
    - CVE-2015-1593
  * net: llc: use correct size for sysctl timeout entries
    - LP: #1425271
    - CVE-2015-2041
  * net: rds: use correct size for max unacked packets and bytes
    - LP: #1425274
    - CVE-2015-2042
  * Btrfs: clear compress-force when remounting with compress option
    - LP: #1434183
  * ext4: merge uninitialized extents
    - LP: #1430184
  * btrfs: filter invalid arg for btrfs resize
    - LP: #1435441
  * Bluetooth: Add firmware update for Atheros 0cf3:311f
  * Bluetooth: btusb: Add IMC Networks (Broadcom based)
  * Bluetooth: sort the list of IDs in the source code
  * Bluetooth: append new supported device to the list [0b05:17d0]
  * Bluetooth: Add support for Intel bootloader devices
  * Bluetooth: Ignore isochronous endpoints for Intel USB bootloader
  * Bluetooth: Add support for Acer [13D3:3432]
  * Bluetooth: Add support for Broadcom device of Asus Z97-DELUXE
    motherboard
  * Add a new PID/VID 0227/0930 for AR3012.
  * Bluetooth: Add support for Acer [0489:e078]
  * Bluetooth: Add USB device 04ca:3010 as Atheros AR3012
  * x86: mm: move mmap_sem unlock from mm_fault_error() to caller
  * vm: add VM_FAULT_SIGSEGV handling support
  * vm: make stack guard page errors return VM_FAULT_SIGSEGV rather than
    SIGBUS
  * spi/pxa2xx: Clear cur_chip pointer before starting next message
  * spi: dw: Fix detecting FIFO depth
  * spi: dw-mid: fix FIFO size
  * ASoC: wm8960: Fix capture sample rate from 11250 to 11025
  * regulator: core: fix race condition in regulator_put()
  * ASoC: omap-mcbsp: Correct CBM_CFS dai format configuration
  * can: c_can: end pending transmission on network stop (ifdown)
  * nfs: fix dio deadlock when O_DIRECT flag is flipped
  * NFSv4.1: Fix an Oops in nfs41_walk_client_list
  * Input: i8042 - add noloop quirk for Medion Akoya E7225 (MD98857)
  * mac80211: properly set CCK flag in radiotap
  * nl80211: fix per-station group key get/del and memory leak
  * i2c: s3c2410: fix ABBA deadlock by keeping clock prepared
  * usb-storage/SCSI: blacklist FUA on JMicron 152d:2566 USB-SATA
    controller
  * drm/i915: Only fence tiled region of object.
  * drm/i915: Fix and clean BDW PCH identification
  * drm/i915: BDW Fix Halo PCI IDs marked as ULT.
  * ALSA: seq-dummy: remove deadlock-causing events on close
  * drivers/rtc/rtc-s5m.c: terminate s5m_rtc_id array with empty element
  * drivers: net: cpsw: discard dual emac default vlan configuration
  * can: kvaser_usb: Do not sleep in atomic context
  * can: kvaser_usb: Send correct context to URB completion
  * can: kvaser_usb: Retry the first bulk transfer on -ETIMEDOUT
  * can: kvaser_usb: Fix state handling upon BUS_ERROR events
  * quota: Switch ->get_dqblk() and ->set_dqblk() to use bytes as space
    units
  * rbd: fix rbd_dev_parent_get() when parent_overlap == 0
  * rbd: drop parent_ref in rbd_dev_unprobe() unconditionally
  * dm cache: fix missing ERR_PTR returns and handling
  * dm thin: don't allow messages to be sent to a pool target in READ_ONLY
    or FAIL mode
  * net: cls_bpf: fix size mismatch on filter preparation
  * net: cls_bpf: fix auto generation of per list handles
  * ipv6: replacing a rt6_info needs to purge possible propagated rt6_infos
    too
  * perf: Tighten (and fix) the grouping condition
  * arc: mm: Fix build failure
  * MIPS: IRQ: Fix disable_irq on CPU IRQs
  * Complete oplock break jobs before closing file handle
  * smpboot: Add missing get_online_cpus() in
    smpboot_register_percpu_thread()
  * ASoC: atmel_ssc_dai: fix start event for I2S mode
  * spi: fsl-dspi: Fix memory leak
  * spi: spi-fsl-dspi: Remove usage of devm_kzalloc
  * ALSA: ak411x: Fix stall in work callback
  * lib/checksum.c: fix carry in csum_tcpudp_nofold
  * MIPS: Fix kernel lockup or crash after CPU offline/online
  * gpio: sysfs: fix memory leak in gpiod_export_link
  * gpio: sysfs: fix memory leak in gpiod_sysfs_set_active_low
  * PCI: Add NEC variants to Stratus ftServer PCIe DMI check
  * ASoC: sgtl5000: add delay before first I2C access
  * PCI: Handle read-only BARs on AMD CS553x devices
  * mm: pagewalk: call pte_hole() for VM_PFNMAP during walk_page_range
  * nilfs2: fix deadlock of segment constructor over I_SYNC flag
  * tcp: ipv4: initialize unicast_sock sk_pacing_rate
  * caif: remove wrong dev_net_set() call
  * qlge: Fix qlge_update_hw_vlan_features to handle if interface is down
  * ip6_gre: fix endianness errors in ip6gre_err
  * spi: dw: revisit FIFO size detection again
  * Linux 3.13.11-ckt17
 -- Kamal Mostafa <kamal@xxxxxxxxxxxxx>   Tue, 24 Mar 2015 11:58:44 -0700

** Changed in: linux (Ubuntu Trusty)
       Status: Fix Committed => Fix Released

** CVE added: http://www.cve.mitre.org/cgi-
bin/cvename.cgi?name=2015-1421

** CVE added: http://www.cve.mitre.org/cgi-
bin/cvename.cgi?name=2015-1465

** CVE added: http://www.cve.mitre.org/cgi-
bin/cvename.cgi?name=2015-1593

** CVE added: http://www.cve.mitre.org/cgi-
bin/cvename.cgi?name=2015-2041

** CVE added: http://www.cve.mitre.org/cgi-
bin/cvename.cgi?name=2015-2042

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1432837

Title:
  HP Proliant Servers - Kernel Panic - NMI - DL360 & DL380 - HPWDT
  module loaded

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Precise:
  Fix Committed
Status in linux source package in Trusty:
  Fix Released
Status in linux source package in Utopic:
  Fix Committed

Bug description:
  It was brought to me several situations where users where facing
  kernel panics when machine was apparently idling (for some HP Proliant
  Servers like DL 360, DL 380).

  ILO:

  "76 CriticalSystem Error03/12/2015 12:4203/12/2015 12:072 An
  Unrecoverable System Error (NMI) has occurred (System error code
  0x0000002B, 0x00000000)"

  Examples:

  PID: 0      TASK: ffffffff81c1a480  CPU: 0   COMMAND: "swapper/0"
   #0 [ffff88085fc05c88] machine_kexec at ffffffff8104eac2
   #1 [ffff88085fc05cd8] crash_kexec at ffffffff810f26a3
   #2 [ffff88085fc05da0] panic at ffffffff8175b3f2
   #3 [ffff88085fc05e20] sched_clock at ffffffff8101c3b9
   #4 [ffff88085fc05e30] nmi_handle at ffffffff810170e8
   #5 [ffff88085fc05e90] io_check_error at ffffffff8101758e
   #6 [ffff88085fc05eb0] default_do_nmi at ffffffff810176a9
   #7 [ffff88085fc05ed8] do_nmi at ffffffff810177d8
   #8 [ffff88085fc05ef0] end_repeat_nmi at ffffffff8176da21
      [exception RIP: native_safe_halt+6]
      RIP: ffffffff81055186  RSP: ffffffff81c03e90  RFLAGS: 00000246
      RAX: 0000000000000010  RBX: 0000000000000010  RCX: 0000000000000246
      RDX: ffffffff81c03e90  RSI: 0000000000000018  RDI: 0000000000000001
      RBP: ffffffff81055186   R8: ffffffff81055186   R9: 0000000000000018
      R10: ffffffff81c03e90  R11: 0000000000000246  R12: ffffffffffffffff
      R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000000
      ORIG_RAX: 0000000000000000  CS: 0010  SS: 0018
  --- <DOUBLEFAULT exception stack> ---
   #9 [ffffffff81c03e90] native_safe_halt at ffffffff81055186
  #10 [ffffffff81c03e98] default_idle at ffffffff8101d37f
  #11 [ffffffff81c03eb8] arch_cpu_idle at ffffffff8101dcaf
  #12 [ffffffff81c03ec8] cpu_startup_entry at ffffffff810b5325
  #13 [ffffffff81c03f40] rest_init at ffffffff81751a37
  #14 [ffffffff81c03f50] start_kernel at ffffffff81d320b7
  #15 [ffffffff81c03f90] x86_64_start_reservations at ffffffff81d315ee
  #16 [ffffffff81c03fa0] x86_64_start_kernel at ffffffff81d31733

  OR

  PID: 0 TASK: ffffffff81c14440 CPU: 0 COMMAND: "swapper/0"
  #0 [ffff880fffa07c40] machine_kexec at ffffffff8104b391
  #1 [ffff880fffa07cb0] crash_kexec at ffffffff810d5fb8
  #2 [ffff880fffa07d80] panic at ffffffff81730335
  #3 [ffff880fffa07e00] hpwdt_pretimeout at ffffffffa02378b5 [hpwdt]
  #4 [ffff880fffa07e20] nmi_handle at ffffffff8174a76a
  #5 [ffff880fffa07ea0] default_do_nmi at ffffffff8174aacd
  #6 [ffff880fffa07ed0] do_nmi at ffffffff8174abe0
  #7 [ffff880fffa07ef0] end_repeat_nmi at ffffffff81749c81
  [exception RIP: intel_idle+204]
  RIP: ffffffff813f07ec RSP: ffffffff81c01d88 RFLAGS: 00000046
  RAX: 0000000000000010 RBX: 0000000000000010 RCX: 0000000000000046
  RDX: ffffffff81c01d88 RSI: 0000000000000018 RDI: 0000000000000001
  RBP: ffffffff813f07ec R8: ffffffff813f07ec R9: 0000000000000018
  R10: ffffffff81c01d88 R11: 0000000000000046 R12: ffffffffffffffff
  R13: 0000000001c0d000 R14: ffffffff81c01fd8 R15: 0000000000000000
  ORIG_RAX: 0000000000000000 CS: 0010 SS: 0018
  --- <NMI exception stack> ---
  #8 [ffffffff81c01d88] intel_idle at ffffffff813f07ec
  #9 [ffffffff81c01dc0] cpuidle_enter_state at ffffffff815e76cf

  It turned out that after investigating all idling situations and
  diverse kernel dump files - where we had most of the CPUs either
  MWAITing and or "relaxing", we discovered that HPWDT was loaded and
  corosync was opening /dev/watchdog file, triggering the ILO watchdog
  timer and not updating frequently enough as ILO expected.

  As described in /etc/modprobe.d/blacklist-watchdog.conf:

  """
  # Watchdog drivers should not be loaded automatically, but only if a
  # watchdog daemon is installed.
  """

  We should blacklist module "hpwdt" by default for all Ubuntu versions.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1432837/+subscriptions


References