← Back to team overview

kernel-packages team mailing list archive

[Bug 1502982] Re: STCOP810:Firestone: frsfp6 EEH on Bluefin does not recover with Ubuntu

 

This bug was fixed in the package linux - 3.19.0-32.37

---------------
linux (3.19.0-32.37) vivid; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1508381

  [ Joseph Salisbury ]

  * SAUCE: storvsc: use small sg_tablesize on x86
    - LP: #1495983

  [ Phidias Chiang ]

  * SAUCE: dma: dw_dmac: Workaround for stop probing on HP X360 laptop v2
    - LP: #1501580

  [ Tim Gardner ]

  * [Config] Add MMC modules sufficient for net booting
    - LP: #1502772

  [ Upstream Kernel Changes ]

  * USB: whiteheat: fix potential null-deref at probe
    - LP: #1478826
    - CVE-2015-5257
  * dcache: Handle escaped paths in prepend_path
    - LP: #1441108
    - CVE-2015-2925
  * vfs: Test for and handle paths that are unreachable from their mnt_root
    - LP: #1441108
    - CVE-2015-2925
  * hv_netvsc: Add support to set MTU reservation from guest side
    - LP: #1494431
  * hv_netvsc: Add close of RNDIS filter into change mtu call
    - LP: #1494431
  * powerpc/eeh: Fix missed PE#0 on P7IOC
    - LP: #1502982
  * powerpc/powernv: display reason for Malfunction Alert HMI.
    - LP: #1482343
  * powerpc/powernv: Pull all HMI events before panic.
    - LP: #1482343
  * powerpc/powernv: Invoke opal_cec_reboot2() on unrecoverable machine
    check errors.
    - LP: #1482343
  * powerpc/powernv: Invoke opal_cec_reboot2() on unrecoverable HMI.
    - LP: #1482343
  * powerpc/eeh: Fix PE#0 check in eeh_add_to_parent_pe()
    - LP: #1502982
  * HID: i2c-hid: The interrupt should be level sensitive v2
    - LP: #1501187
  * HID: i2c-hid: Add support for ACPI GPIO interrupts v2
    - LP: #1501187

 -- Luis Henriques <luis.henriques@xxxxxxxxxxxxx>  Wed, 21 Oct 2015
10:30:13 +0100

** Changed in: linux (Ubuntu Vivid)
       Status: Fix Committed => Fix Released

** CVE added: http://www.cve.mitre.org/cgi-
bin/cvename.cgi?name=2015-2925

** CVE added: http://www.cve.mitre.org/cgi-
bin/cvename.cgi?name=2015-5257

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1502982

Title:
  STCOP810:Firestone: frsfp6 EEH on Bluefin does not recover with Ubuntu

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Vivid:
  Fix Released
Status in linux source package in Wily:
  Fix Released

Bug description:
  Problem:
  ==========
  Test Case Execution Record:
  	
  95613: EEH_Firestone_Ubuntu 14.04.03_Bluefin_Standalone on frsfp6

  Error Injection Method: err_injct_inboundA

  Step 1. Start HTX (I used mdt.hdbuster & only ran htx on bluefin disks)
  Step 2. Inject EEH error

  bluefin is in slot P1-C4 (PCI0004)

   echo 0x8000000000000000 >
  /sys/kernel/debug/powerpc/PCI0004/err_injct_inboundA; sleep 1; echo
  0x0 > /sys/kernel/debug/powerpc/PCI0004/err_injct_inboundA

  Expected Result: Adapter/SAN disks to recover and htx still run

  Actual Result:  Adapter did not recover... continuous EEH errors until
  limit of 6 is reached in 1 hour

  There're two patches: one for skiboot firmware and another patch,
  which has been in upstream, was missed in ubuntu distro (at least
  15.04). The skiboot patch has been merged to upstream.

  c7192a4 PHB3: Fix wrong PE number in error injection (skiboot)
  2aa5cf9 powerpc/eeh: Fix missed PE#0 on P7IOC         (linux)

  If I'm correct, I think this bug needs to be mirrored so that the
  Linux patch (commit 2aa5cf9) can be backported to ubuntu distro. With
  the patch backported to ubuntu 15.04, EEH works fine on Broadcom
  adapter (not exactly the one where the bug was reported initially):

  root@fstn2-p1:/# dmesg | grep EEH
  [    0.216919] EEH: PowerNV platform initialized
  [    0.570606] EEH: devices created
  [    1.302482] EEH: PCI Enhanced I/O Error Handling Enabled
  [   90.566761] EEH: PHB location: Slot1
  [   90.567503] EEH: Frozen PHB#4-PE#0 detected
  [   90.567673] EEH: PE location: Slot1, PHB location: Slot1
  [   90.567930] EEH: Detected PCI bus error on PHB#4-PE#0
  [   90.567935] EEH: This PCI device has failed 1 times in the last hour
  [   90.567937] EEH: Notify device drivers to shutdown
  [   90.567985] EEH: Collect temporary log
  [   90.568971] EEH: Reset without hotplug activity
  [   94.585540] EEH: Notify device drivers the completion of reset
  [   94.585934] EEH: Notify device driver to resume

  ----

  The story about this bug is: Without commit 2aa5cf9 ("powerpc/eeh: Fix
  missed PE#0 on P7IOC"). PE#0 is regarded as invalid one. When kernel
  sees the frozen PE#0, the frozen state is cleared and dump the PHB
  diag-data, then try to recover it. When resetting the PE, the driver,
  which wasn't stopped by error_detected() completely, access the MMIO
  space and just causes another (recursive) EEH error. Eventually, the
  EEH recovery failed. During the PE reset, the I/O path for the PE
  should be frozen and MMIO access during the period should be dropped
  to avoid recursive EEH error.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1502982/+subscriptions