← Back to team overview

kernel-packages team mailing list archive

[Bug 1538909] Re: OPRASHB:Habanero:EEH: Opal not calling out slot number for failing adapter behind plx switch

 

** Changed in: linux (Ubuntu)
     Assignee: Taco Screen team (taco-screen-team) => Canonical Kernel Team (canonical-kernel-team)

** Changed in: linux (Ubuntu)
   Importance: Undecided => High

** Changed in: linux (Ubuntu)
       Status: New => Triaged

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1538909

Title:
  OPRASHB:Habanero:EEH: Opal not calling out slot number for failing
  adapter behind plx switch

Status in linux package in Ubuntu:
  Triaged

Bug description:
  == Comment: #0 - MAMATHA INAMDAR <mainamdar@xxxxxxxxxx> - 2016-01-14 04:51:41 ==
  ---Problem Description---
  Working with Chad (IO team) we were able to inject an EEH recoverable error to the broadcom network adapter (PE #2) behind the PLX switch. Looks like OPAL calls out the Backplane PLX ( Planar ) instead of the adapter slot.

  We want to primarily focus on why the adapter slot ( behind PLX )
  didn't get called out using this defect.

  Problem:
  ========
  >> Working with Chad (IO team) we were able to inject an EEH recoverable error
  >> to the broadcom network adapter (PE #2) and noticed that we are not getting
  >> the adapter slot called out, instead we get the location pointing to the
  >> backplane PLX.
  >
  >If what you injected is a PCIe error message, I think those cause the
  >switch leg to freeze, but I will need Gavin to confirm.
  >
  >> Per Chad:
  >> ??????????????????????????????????????????????????????????????????????????????
  >> ?They're logging the right PE (#2--which corresponds to the Broadcom??????????
  >> ?adapter)--they're just not pointing to its slot explicitly.??????????????????
  >> ??????????????????????????????????????????????????????????????????????????????
  >>
  >>
  >>
  >> Here is a snippet the /var/log/messages:
  >>
  >> ??????????????????????????????????????????????????????????????????????????????????
  >> ?Nov 11 12:17:18 habmc8p01 kernel: EEH: Frozen PHB#1-PE#2 detected????????????????
  >> ?Nov 11 12:17:18 habmc8p01 kernel: bnx2x: [bnx2x_timer:5750(net0)]MFW seems???????
  >> ?hanged: drv_pulse (0x1c1) != mcp_pulse (0x7fff)??????????????????????????????????
  >> ?Nov 11 12:17:18 habmc8p01 kernel: EEH: PE location: Backplane PLX, PHB location:
  >> ?Backplane PLX????????????????????????????????????????????????????????????????????
  >> ??????????????????????????????????????????????????????????????????????????????????
  >>
  >>
  >>
  >>
  >> injection:
  >> setpci -s 0001:0c:00.2 COMMAND
  >> ???????????????????????????
  >> ?setpci -s??0001:0c:00.2???
  >> ?COMMAND=0540??????????????
  >> ???????????????????????????

  It seems you're disabling memory BAR and then issue MMIO load, which results
  in "unsupported request" returned from the adapter. In response to that, the
  PE#2 as shown in the kernel log is put to frozen state. Nothing wrong at this
  point. I think the only question would be: the location code isn't making sense.
   
  Contact Information = ----- 
   
  ---uname output---
  3.19.0-43-generic
   
  Machine Type = ---- 
   
  ---Debugger---
  A debugger is not configured
   
  ---Steps to Reproduce---
   This bug is follow up of bug 133061. This  bug is opened to backport the kernel patch which is available to fix the issue for bug 133061 on Ubuntu.
   
  Stack trace output:
   no
   
  Oops output:
   no
   
  System Dump Info:
    The system is not configured to capture a system dump.
   
  *Additional Instructions for -----: 
  -Post a private note with access information to the machine that the bug is occuring on. 
  -Attach sysctl -a output output to the bug.

  Firestone server ( Ubuntu Host )
  =================================

  I have built the Ubuntu kernel with patch and created  *.deb files.

  Please find the same in the following path for installing and testing
  your test case

  ==== State: Assigned by: thalerj on 13 January 2016 12:10:58 ====

  The patched kernel is working great for both firestone and Habanero.
  It resolves the issue and all slot numbers are called out properly.

  == Comment: #1 - MAMATHA INAMDAR <mainamdar@xxxxxxxxxx> - 2016-01-28 01:02:23 ==
  Patch is now available in the following branch

  https://git.kernel.org/cgit/linux/kernel/git/powerpc/linux.git/commit/?h=fixes&id=7e56f627768da4e6480986b5145dc3422bc448a5

  == Comment: #3 - MAMATHA INAMDAR <mainamdar@xxxxxxxxxx> - 2016-01-28
  01:09:14 ==

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1538909/+subscriptions