kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #158194
[Bug 1538909] [NEW] OPRASHB:Habanero:EEH: Opal not calling out slot number for failing adapter behind plx switch
You have been subscribed to a public bug:
== Comment: #0 - MAMATHA INAMDAR <mainamdar@xxxxxxxxxx> - 2016-01-14 04:51:41 ==
---Problem Description---
Working with Chad (IO team) we were able to inject an EEH recoverable error to the broadcom network adapter (PE #2) behind the PLX switch. Looks like OPAL calls out the Backplane PLX ( Planar ) instead of the adapter slot.
We want to primarily focus on why the adapter slot ( behind PLX ) didn't
get called out using this defect.
Problem:
========
>> Working with Chad (IO team) we were able to inject an EEH recoverable error
>> to the broadcom network adapter (PE #2) and noticed that we are not getting
>> the adapter slot called out, instead we get the location pointing to the
>> backplane PLX.
>
>If what you injected is a PCIe error message, I think those cause the
>switch leg to freeze, but I will need Gavin to confirm.
>
>> Per Chad:
>> ??????????????????????????????????????????????????????????????????????????????
>> ?They're logging the right PE (#2--which corresponds to the Broadcom??????????
>> ?adapter)--they're just not pointing to its slot explicitly.??????????????????
>> ??????????????????????????????????????????????????????????????????????????????
>>
>>
>>
>> Here is a snippet the /var/log/messages:
>>
>> ??????????????????????????????????????????????????????????????????????????????????
>> ?Nov 11 12:17:18 habmc8p01 kernel: EEH: Frozen PHB#1-PE#2 detected????????????????
>> ?Nov 11 12:17:18 habmc8p01 kernel: bnx2x: [bnx2x_timer:5750(net0)]MFW seems???????
>> ?hanged: drv_pulse (0x1c1) != mcp_pulse (0x7fff)??????????????????????????????????
>> ?Nov 11 12:17:18 habmc8p01 kernel: EEH: PE location: Backplane PLX, PHB location:
>> ?Backplane PLX????????????????????????????????????????????????????????????????????
>> ??????????????????????????????????????????????????????????????????????????????????
>>
>>
>>
>>
>> injection:
>> setpci -s 0001:0c:00.2 COMMAND
>> ???????????????????????????
>> ?setpci -s??0001:0c:00.2???
>> ?COMMAND=0540??????????????
>> ???????????????????????????
It seems you're disabling memory BAR and then issue MMIO load, which results
in "unsupported request" returned from the adapter. In response to that, the
PE#2 as shown in the kernel log is put to frozen state. Nothing wrong at this
point. I think the only question would be: the location code isn't making sense.
Contact Information = -----
---uname output---
3.19.0-43-generic
Machine Type = ----
---Debugger---
A debugger is not configured
---Steps to Reproduce---
This bug is follow up of bug 133061. This bug is opened to backport the kernel patch which is available to fix the issue for bug 133061 on Ubuntu.
Stack trace output:
no
Oops output:
no
System Dump Info:
The system is not configured to capture a system dump.
*Additional Instructions for -----:
-Post a private note with access information to the machine that the bug is occuring on.
-Attach sysctl -a output output to the bug.
Firestone server ( Ubuntu Host )
=================================
I have built the Ubuntu kernel with patch and created *.deb files.
Please find the same in the following path for installing and testing
your test case
==== State: Assigned by: thalerj on 13 January 2016 12:10:58 ====
The patched kernel is working great for both firestone and Habanero. It
resolves the issue and all slot numbers are called out properly.
== Comment: #1 - MAMATHA INAMDAR <mainamdar@xxxxxxxxxx> - 2016-01-28 01:02:23 ==
Patch is now available in the following branch
https://git.kernel.org/cgit/linux/kernel/git/powerpc/linux.git/commit/?h=fixes&id=7e56f627768da4e6480986b5145dc3422bc448a5
== Comment: #3 - MAMATHA INAMDAR <mainamdar@xxxxxxxxxx> - 2016-01-28
01:09:14 ==
** Affects: linux (Ubuntu)
Importance: Undecided
Assignee: Taco Screen team (taco-screen-team)
Status: New
** Tags: architecture-ppc64 bot-comment bugnameltc-135219 severity-high targetmilestone-inin---
--
OPRASHB:Habanero:EEH: Opal not calling out slot number for failing adapter behind plx switch
https://bugs.launchpad.net/bugs/1538909
You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu.