← Back to team overview

kernel-packages team mailing list archive

[Bug 1538909] Re: OPRASHB:Habanero:EEH: Opal not calling out slot number for failing adapter behind plx switch

 

This bug was fixed in the package linux - 4.4.0-4.19

---------------
linux (4.4.0-4.19) xenial; urgency=low

  * update ZFS and SPL to 0.6.5.4 (LP: #1542296)
    - [Config] update spl/zfs version
    - SAUCE: (noup) Update spl to 0.6.5.4-0ubuntu2, zfs to 0.6.5.4-0ubuntu1
    - [Config] reconstruct -- drop links for zfs userspace components
    - [Config] reconstruct -- drop links for zfs userspace components -- restore spec links

  * recvmsg() fails SCM_CREDENTIALS request with EOPNOTSUPP. (LP: #1540731)
    - Revert "af_unix: Revert 'lock_interruptible' in stream receive code"

  * lxc: ADT exercise test failing with linux-4.4.0-3.17  (LP: #1542049)
    - Revert "UBUNTU: SAUCE: apparmor: fix sleep from invalid context"

  * WARNING: at /build/linux-lts-wily-W0lTWH/linux-lts-wily-4.2.0/net/core/skbuff.c:4174 (Travis IB) (LP: #1541326)
    - SAUCE: IB/IPoIB: Do not set skb truesize since using one linearskb

  * backport Microsoft Precision Touchpad palm rejection patch (LP: #1541671)
    - HID: multitouch: enable palm rejection if device implements confidence usage

  * [Ubuntu 16.04] Update qla2xxx driver for POWER (QLogic) (LP: #1541456)
    - qla2xxx: Remove unavailable firmware files
    - qla2xxx: Enable Extended Logins support
    - qla2xxx: Enable Exchange offload support.
    - qla2xxx: Enable Target counters in DebugFS.
    - qla2xxx: Add FW resource count in DebugFS.
    - qla2xxx: Added interface to send explicit LOGO.
    - qla2xxx: Delete session if initiator is gone from FW
    - qla2xxx: Wait for all conflicts before ack'ing PLOGI
    - qla2xxx: Replace QLA_TGT_STATE_ABORTED with a bit.
    - qla2xxx: Remove dependency on hardware_lock to reduce lock contention.
    - qla2xxx: Add irq affinity notification
    - qla2xxx: Add selective command queuing
    - qla2xxx: Move atioq to a different lock to reduce lock contention
    - qla2xxx: Disable ZIO at start time.
    - qla2xxx: Set all queues to 4k
    - qla2xxx: Check for online flag instead of active reset when transmitting responses
    - scsi: qla2xxxx: avoid type mismatch in comparison

  * [Hyper-V] PCI Passthrough (LP: #1541120)
    - x86/irq: Export functions to allow MSI domains in modules
    - genirq/msi: Export functions to allow MSI domains in modules

  * Update lpfc driver to 11.0.0.10 (LP: #1541592)
    - lpfc: Fix FCF Infinite loop in lpfc_sli4_fcf_rr_next_index_get.
    - lpfc: Fix the FLOGI discovery logic to comply with T11 standards
    - lpfc: Fix RegLogin failed error seen on Lancer FC during port bounce
    - lpfc: Fix driver crash when module parameter lpfc_fcp_io_channel set to 16
    - lpfc: Fix crash in fcp command completion path.
    - lpfc: Modularize and cleanup FDMI code in driver
    - lpfc: Fix RDP Speed reporting.
    - lpfc: Fix RDP ACC being too long.
    - lpfc: Make write check error processing more resilient
    - lpfc: Use new FDMI speed definitions for 10G, 25G and 40G FCoE.
    - lpfc: Fix mbox reuse in PLOGI completion
    - lpfc: Fix external loopback failure.
    - lpfc: Add logging for misconfigured optics.
    - lpfc: Delete unnecessary checks before the function call "mempool_destroy"
    - lpfc: Use kzalloc instead of kmalloc
    - lpfc: Update version to 11.0.0.10 for upstream patch set

  * Miscellaneous Ubuntu changes
    - [Config] CONFIG_ARM64_VA_BITS=48
    - [Config] Fixed Vcs-Git

  * Miscellaneous upstream changes
    - cxl: Fix possible idr warning when contexts are released
    - cxl: use correct operator when writing pcie config space values
    - cxlflash: drop unlikely before IS_ERR_OR_NULL
    - cxl: Fix DSI misses when the context owning task exits
    - cxlflash: Removed driver date print
    - cxlflash: Fix to resolve cmd leak after host reset
    - cxlflash: Resolve oops in wait_port_offline
    - cxlflash: Enable device id for future IBM CXL adapter
    - cxl: fix build for GCC 4.6.x
    - cxl: use -Werror only with CONFIG_PPC_WERROR
    - cxl: Enable PCI device ID for future IBM CXL adapter

 -- Andy Whitcroft <apw@xxxxxxxxxxxxx>  Fri, 05 Feb 2016 14:58:51 +0000

** Changed in: linux (Ubuntu Xenial)
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1538909

Title:
  OPRASHB:Habanero:EEH: Opal not calling out slot number for failing
  adapter behind plx switch

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Vivid:
  New
Status in linux source package in Wily:
  New
Status in linux source package in Xenial:
  Fix Released

Bug description:
  == Comment: #0 - MAMATHA INAMDAR <mainamdar@xxxxxxxxxx> - 2016-01-14 04:51:41 ==
  ---Problem Description---
  Working with Chad (IO team) we were able to inject an EEH recoverable error to the broadcom network adapter (PE #2) behind the PLX switch. Looks like OPAL calls out the Backplane PLX ( Planar ) instead of the adapter slot.

  We want to primarily focus on why the adapter slot ( behind PLX )
  didn't get called out using this defect.

  Problem:
  ========
  >> Working with Chad (IO team) we were able to inject an EEH recoverable error
  >> to the broadcom network adapter (PE #2) and noticed that we are not getting
  >> the adapter slot called out, instead we get the location pointing to the
  >> backplane PLX.
  >
  >If what you injected is a PCIe error message, I think those cause the
  >switch leg to freeze, but I will need Gavin to confirm.
  >
  >> Per Chad:
  >> ??????????????????????????????????????????????????????????????????????????????
  >> ?They're logging the right PE (#2--which corresponds to the Broadcom??????????
  >> ?adapter)--they're just not pointing to its slot explicitly.??????????????????
  >> ??????????????????????????????????????????????????????????????????????????????
  >>
  >>
  >>
  >> Here is a snippet the /var/log/messages:
  >>
  >> ??????????????????????????????????????????????????????????????????????????????????
  >> ?Nov 11 12:17:18 habmc8p01 kernel: EEH: Frozen PHB#1-PE#2 detected????????????????
  >> ?Nov 11 12:17:18 habmc8p01 kernel: bnx2x: [bnx2x_timer:5750(net0)]MFW seems???????
  >> ?hanged: drv_pulse (0x1c1) != mcp_pulse (0x7fff)??????????????????????????????????
  >> ?Nov 11 12:17:18 habmc8p01 kernel: EEH: PE location: Backplane PLX, PHB location:
  >> ?Backplane PLX????????????????????????????????????????????????????????????????????
  >> ??????????????????????????????????????????????????????????????????????????????????
  >>
  >>
  >>
  >>
  >> injection:
  >> setpci -s 0001:0c:00.2 COMMAND
  >> ???????????????????????????
  >> ?setpci -s??0001:0c:00.2???
  >> ?COMMAND=0540??????????????
  >> ???????????????????????????

  It seems you're disabling memory BAR and then issue MMIO load, which results
  in "unsupported request" returned from the adapter. In response to that, the
  PE#2 as shown in the kernel log is put to frozen state. Nothing wrong at this
  point. I think the only question would be: the location code isn't making sense.
   
  Contact Information = ----- 
   
  ---uname output---
  3.19.0-43-generic
   
  Machine Type = ---- 
   
  ---Debugger---
  A debugger is not configured
   
  ---Steps to Reproduce---
   This bug is follow up of bug 133061. This  bug is opened to backport the kernel patch which is available to fix the issue for bug 133061 on Ubuntu.
   
  Stack trace output:
   no
   
  Oops output:
   no
   
  System Dump Info:
    The system is not configured to capture a system dump.
   
  *Additional Instructions for -----: 
  -Post a private note with access information to the machine that the bug is occuring on. 
  -Attach sysctl -a output output to the bug.

  Firestone server ( Ubuntu Host )
  =================================

  I have built the Ubuntu kernel with patch and created  *.deb files.

  Please find the same in the following path for installing and testing
  your test case

  ==== State: Assigned by: thalerj on 13 January 2016 12:10:58 ====

  The patched kernel is working great for both firestone and Habanero.
  It resolves the issue and all slot numbers are called out properly.

  == Comment: #1 - MAMATHA INAMDAR <mainamdar@xxxxxxxxxx> - 2016-01-28 01:02:23 ==
  Patch is now available in the following branch

  https://git.kernel.org/cgit/linux/kernel/git/powerpc/linux.git/commit/?h=fixes&id=7e56f627768da4e6480986b5145dc3422bc448a5

  == Comment: #3 - MAMATHA INAMDAR <mainamdar@xxxxxxxxxx> - 2016-01-28
  01:09:14 ==

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1538909/+subscriptions