kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #152782
[Bug 1522071] Re: EEH recovery fails for shinner T on firestone
------- Comment From muvic@xxxxxxxxxx 2015-12-21 17:07 EDT-------
Externalizing comment and closing the issue on our side, since the fix is released and tested. Thank you for the help.
(In reply to comment #29)
> thanks Guilherme for the system and getting the system firmware upgraded to
> latest SP2.
>
> Issue seems to fixed with Ubuntu 14.04.04.
>
> root@ltc-fire2:~# uname -a
> Linux ltc-fire2 4.2.0-22-generic #27~14.04.1-Ubuntu SMP Fri Dec 18 10:56:52
> UTC 2015 ppc64le ppc64le ppc64le GNU/Linux
>
>
> root@ltc-fire2:~# cat /etc/network/interfaces
> # This file describes the network interfaces available on your system
> # and how to activate them. For more information, see interfaces(5).
>
> # The loopback network interface
> auto lo
> iface lo inet loopback
>
> # The primary network interface
> auto eth4
> iface eth4 inet static
> address 9.40.195.137
> netmask 255.255.255.0
> network 9.40.195.0
> broadcast 9.40.195.255
> gateway 9.40.195.1
> # dns-* options are implemented by the resolvconf package, if installed
> dns-nameservers 9.3.1.200
> dns-search aus.stglabs.ibm.com
>
>
>
> root@ltc-fire2:~# ll /sys/class/net
> total 0
> drwxr-xr-x 2 root root 0 Dec 21 09:34 ./
> drwxr-xr-x 58 root root 0 Dec 21 08:59 ../
> lrwxrwxrwx 1 root root 0 Dec 21 08:59 eth0 ->
> ../../devices/pci0004:00/0004:00:00.0/0004:01:00.0/net/eth0/
> lrwxrwxrwx 1 root root 0 Dec 21 08:59 eth1 ->
> ../../devices/pci0004:00/0004:00:00.0/0004:01:00.1/net/eth1/
> lrwxrwxrwx 1 root root 0 Dec 21 08:59 eth2 ->
> ../../devices/pci0004:00/0004:00:00.0/0004:01:00.2/net/eth2/
> lrwxrwxrwx 1 root root 0 Dec 21 08:59 eth3 ->
> ../../devices/pci0004:00/0004:00:00.0/0004:01:00.3/net/eth3/
> lrwxrwxrwx 1 root root 0 Dec 21 08:59 eth4 ->
> ../../devices/pci0001:00/0001:00:00.0/0001:01:00.0/net/eth4/
> lrwxrwxrwx 1 root root 0 Dec 21 08:59 eth5 ->
> ../../devices/pci0001:00/0001:00:00.0/0001:01:00.1/net/eth5/
> lrwxrwxrwx 1 root root 0 Dec 21 08:59 lo -> ../../devices/virtual/net/lo/
> root@ltc-fire2:~# route -n
> Kernel IP routing table
> Destination Gateway Genmask Flags Metric Ref Use Iface
> 0.0.0.0 9.40.195.1 0.0.0.0 UG 0 0 0 eth4
> 9.40.195.0 0.0.0.0 255.255.255.0 U 0 0 0 eth4
>
>
>
> root@ltc-fire2:~# lspci -nn
> ...
> 0001:00:00.0 PCI bridge [0604]: IBM Device [1014:03dc]
> 0001:01:00.0 Ethernet controller [0200]: Broadcom Corporation NetXtreme II
> BCM57810 10 Gigabit Ethernet [14e4:168e] (rev 10)
> 0001:01:00.1 Ethernet controller [0200]: Broadcom Corporation NetXtreme II
> BCM57810 10 Gigabit Ethernet [14e4:168e] (rev 10)
>
>
> root@ltc-fire2:~# echo 1:1:0:0:0 >
> /sys/kernel/debug/powerpc/PCI0001/err_injct
> root@ltc-fire2:~# [ 2444.814857] EEH: Frozen PE#1 on PHB#1 detected
> [ 2444.815000] EEH: PE location: N/A, PHB location: N/A
> [ 2444.816227] bnx2x: [bnx2x_io_error_detected:13743(eth4)]IO error detected
> [ 2444.831916] bnx2x: [bnx2x_timer:5761(eth4)]MFW seems hanged: drv_pulse
> (0x726) != mcp_pulse (0x7fff)
> [ 2444.832477] bnx2x: [bnx2x_io_error_detected:13743(eth5)]IO error detected
> [ 2448.850852] bnx2x: [bnx2x_io_slot_reset:13778(eth4)]IO slot reset
> initializing...
> [ 2448.851245] bnx2x: [bnx2x_io_slot_reset:13794(eth4)]IO slot reset -->
> driver unload
> [ 2448.932353] bnx2x: [bnx2x_io_slot_reset:13778(eth5)]IO slot reset
> initializing...
>
>
>
> Dec 21 09:39:47 ltc-fire2 kernel: [ 2444.814857] EEH: Frozen PE#1 on PHB#1
> detecc
> ted
> Dec 21 09:39:47 ltc-fire2 kernel: [ 2444.815000] EEH: PE location: N/A, PHB
> locaa
> tion: N/A
> Dec 21 09:39:47 ltc-fire2 kernel: [ 2444.816220] EEH: This PCI device has
> failedd
> 1 times in the last hour
> Dec 21 09:39:47 ltc-fire2 kernel: [ 2444.816222] EEH: Notify device drivers
> to ss
> hutdown
> Dec 21 09:39:47 ltc-fire2 kernel: [ 2444.816227] bnx2x:
> [bnx2x_io_error_detectedd
> :13743(eth4)]IO error detected
> Dec 21 09:39:47 ltc-fire2 kernel: [ 2444.831916] bnx2x:
> [bnx2x_timer:5761(eth4)]]
> MFW seems hanged: drv_pulse (0x726) != mcp_pulse (0x7fff)
> Dec 21 09:39:47 ltc-fire2 kernel: [ 2444.832477] bnx2x:
> [bnx2x_io_error_detectedd
> :13743(eth5)]IO error detected
> Dec 21 09:39:47 ltc-fire2 kernel: [ 2444.832702] EEH: Collect temporary log
> Dec 21 09:39:47 ltc-fire2 kernel: [ 2444.832705] PHB3 PHB#1 Diag-data
> (Version:
> 1)
> Dec 21 09:39:47 ltc-fire2 kernel: [ 2444.832707] brdgCtl: 00000002
> Dec 21 09:39:47 ltc-fire2 kernel: [ 2444.832709] RootSts: 00000040
> 00400000
> f0820048 00100147 00002000
> Dec 21 09:39:47 ltc-fire2 kernel: [ 2444.832712] PhbSts:
> 0000001c00000000 00
> 000001c00000000
> Dec 21 09:39:47 ltc-fire2 kernel: [ 2444.832712] PhbSts:
> 0000001c00000000 00
> 000001c00000000
> Dec 21 09:39:47 ltc-fire2 kernel: [ 2444.832714] Lem:
> 0000001000000004 44
> 2498e327f502eae 0000000000000000
> Dec 21 09:39:47 ltc-fire2 kernel: [ 2444.832717] OutErr:
> 0000000800000000 00
> 000000800000000 0204006000003b10 103c731800000000
> Dec 21 09:39:47 ltc-fire2 kernel: [ 2444.832719] InBErr:
> 0000000000000020 00
> 000000000000020 4001010000000000 0000000000000000
> Dec 21 09:39:47 ltc-fire2 kernel: [ 2444.832721] PE[ 1] A/B:
> 8400001b00000000 88
> 0003b10103c7318
> Dec 21 09:39:47 ltc-fire2 kernel: [ 2444.832724] EEH: Reset without hotplug
> actii
> vity
> Dec 21 09:39:51 ltc-fire2 kernel: [ 2448.850848] EEH: Notify device drivers
> the
> completion of reset
> Dec 21 09:39:51 ltc-fire2 kernel: [ 2448.850852] bnx2x:
> [bnx2x_io_slot_reset:1377
> 78(eth4)]IO slot reset initializing...
> Dec 21 09:39:51 ltc-fire2 kernel: [ 2448.851097] bnx2x 0001:01:00.0:
> enabling dee
> vice (0140 -> 0142)
> Dec 21 09:39:51 ltc-fire2 kernel: [ 2448.851245] bnx2x:
> [bnx2x_io_slot_reset:1377
> 94(eth4)]IO slot reset --> driver unload
> Dec 21 09:39:51 ltc-fire2 kernel: [ 2448.932353] bnx2x:
> [bnx2x_io_slot_reset:1377
> 78(eth5)]IO slot reset initializing...
> Dec 21 09:39:51 ltc-fire2 kernel: [ 2448.932609] bnx2x 0001:01:00.1:
> enabling dee
> vice (0140 -> 0142)
> Dec 21 09:39:51 ltc-fire2 kernel: [ 2448.932747] EEH: Notify device driver
> to ree
> sume
> Dec 21 09:39:51 ltc-fire2 kernel: [ 2449.435871] bnx2x 0001:01:00.0 eth4:
> using
> MSI-X IRQs: sp 498 fp[0] 502 ... fp[7] 495
> Dec 21 09:39:54 ltc-fire2 kernel: [ 2452.598799] bnx2x 0001:01:00.0 eth4:
> NIC Lii
> nk is Up, 1000 Mbps full duplex, Flow control: ON - receive & transmit
>
>
> after the EEH , interface is accessible upto 5 EEH per design
** Tags removed: targetmilestone-inin---
** Tags added: targetmilestone-inin14044
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1522071
Title:
EEH recovery fails for shinner T on firestone
Status in linux package in Ubuntu:
Fix Released
Status in linux source package in Vivid:
Fix Committed
Status in linux source package in Wily:
Fix Released
Status in linux source package in Xenial:
Fix Released
Bug description:
== Comment: #0 - Manvanthara B. Puttashankar <mputtash@xxxxxxxxxx> - 2015-07-27 02:38:12 ==
---Problem Description---
EEH recovery fails for shinner T on firestone
Contact Information = mputtash@xxxxxxxxxx
---uname output---
Linux rcx2c309 3.19.0-23-generic #24~14.04.1-Ubuntu SMP Wed Jul 8 11:17:19 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux
Machine Type = firestone
---Debugger---
A debugger is not configured
---Steps to Reproduce---
root@rcx2c309:~# uname -a
Linux rcx2c309 3.19.0-23-generic #24~14.04.1-Ubuntu SMP Wed Jul 8 11:17:19 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux
root@rcx2c309:~# ethtool eth1
Settings for eth1:
Supported ports: [ TP ]
Supported link modes: 100baseT/Half 100baseT/Full
1000baseT/Full
10000baseT/Full
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: Yes
Advertised link modes: 100baseT/Half 100baseT/Full
1000baseT/Full
10000baseT/Full
Advertised pause frame use: Symmetric Receive-only
Advertised auto-negotiation: Yes
Link partner advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Link partner advertised pause frame use: Transmit-only
Link partner advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 17
Transceiver: internal
Auto-negotiation: on
MDI-X: Unknown
Supports Wake-on: d
Wake-on: d
Current message level: 0x00000000 (0)
Link detected: yes
root@rcx2c309:/sys/bus/pci/devices/0001:01:00.1# ll /sys/class/net/
total 0
drwxr-xr-x 2 root root 0 Jul 24 04:23 ./
drwxr-xr-x 58 root root 0 Jul 24 03:45 ../
lrwxrwxrwx 1 root root 0 Jul 26 23:17 eth0 -> ../../devices/pci0001:00/0001:00:00.0/0001:01:00.0/net/eth0/
lrwxrwxrwx 1 root root 0 Jul 24 07:33 eth1 -> ../../devices/pci0001:00/0001:00:00.0/0001:01:00.1/net/eth1/ <==================== this interface
lrwxrwxrwx 1 root root 0 Jul 24 03:45 lo -> ../../devices/virtual/net/lo/
lrwxrwxrwx 1 root root 0 Jul 24 03:45 virbr0 -> ../../devices/virtual/net/virbr0/
Every 2.0s: netstat -i
Sun Jul 26 23:26:16 2015
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
eth1 1500 0 1230820 0 12167 0 45239 0 0 0 BMRU
lo 65536 0 22 0 0 0 22 0 0 0 LRU
virbr0 1500 0 0 0 0 0 0 0 0 0 BMU
syslog:
Jul 27 01:09:54 rcx2c309 kernel: [ 68.122649] EEH: Frozen PE#1 on PHB#1 detected
Jul 27 01:09:54 rcx2c309 kernel: [ 68.122790] EEH: PE location: N/A, PHB location: N/A
Jul 27 01:09:54 rcx2c309 kernel: [ 68.123539] EEH: This PCI device has failed 1 times in the last hour
Jul 27 01:09:54 rcx2c309 kernel: [ 68.123540] EEH: Notify device drivers to shutdown
Jul 27 01:09:54 rcx2c309 kernel: [ 68.123545] bnx2x: [bnx2x_io_error_detected:13702(eth0)]IO error detected
Jul 27 01:09:54 rcx2c309 kernel: [ 68.123706] bnx2x: [bnx2x_io_error_detected:13702(eth1)]IO error detected
Jul 27 01:09:54 rcx2c309 kernel: [ 68.154922] bnx2x: [bnx2x_timer:5753(eth1)]MFW seems hanged: drv_pulse (0x75) != mcp_pulse (0x7fff)
Jul 27 01:09:54 rcx2c309 kernel: [ 68.155146] EEH: Collect temporary log
Jul 27 01:09:54 rcx2c309 kernel: [ 68.235532] PHB3 PHB#1 Diag-data (Version: 1)
Jul 27 01:09:54 rcx2c309 kernel: [ 68.235535] brdgCtl: 00000002
Jul 27 01:09:54 rcx2c309 kernel: [ 68.235538] RootSts: 00000040 00400000 f0820048 00100147 00002000
Jul 27 01:09:54 rcx2c309 kernel: [ 68.235541] PhbSts: 0000001c00000000 0000001c00000000
Jul 27 01:09:54 rcx2c309 kernel: [ 68.235543] Lem: 0000001000000004 42498e327f502eae 0000000000000000
Jul 27 01:09:54 rcx2c309 kernel: [ 68.235546] OutErr: 0000000800000000 0000000800000000 0204006000003b10 113c7cd800000000
Jul 27 01:09:54 rcx2c309 kernel: [ 68.235549] InBErr: 0000000000000020 0000000000000020 4001010000000000 0000000000000000
Jul 27 01:09:54 rcx2c309 kernel: [ 68.235551] PE[ 1] A/B: 8400001b00000000 80003b10113c7cd8
Jul 27 01:09:54 rcx2c309 kernel: [ 68.235554] EEH: Reset without hotplug activity
Jul 27 01:09:54 rcx2c309 kernel: [ 68.236546] EEH: PHB#1 failure detected, location: N/A
Jul 27 01:09:54 rcx2c309 kernel: [ 68.236698] CPU: 9 PID: 1093 Comm: kworker/9:1 Tainted: G OE 3.19.0-23-generic #24~14.04.1-Ubuntu
Jul 27 01:09:54 rcx2c309 kernel: [ 68.236704] Workqueue: events linkwatch_event
Jul 27 01:09:54 rcx2c309 kernel: [ 68.236706] Call Trace:
Jul 27 01:09:54 rcx2c309 kernel: [ 68.236709] [c000003c9923b6c0] [c000000000a26690] dump_stack+0x90/0xbc (unreliable)
Jul 27 01:09:54 rcx2c309 kernel: [ 68.236713] [c000003c9923b6f0] [c000000000036a5c] eeh_dev_check_failure+0x22c/0x560
Jul 27 01:09:54 rcx2c309 kernel: [ 68.236715] [c000003c9923b790] [c000000000036e14] eeh_check_failure+0x84/0xe0
Jul 27 01:09:54 rcx2c309 kernel: [ 68.236737] [c000003c9923b7d0] [d00000001c7854a0] bnx2x_get_ext_phy_fw_version+0x1e0/0x220 [bnx2x]
Jul 27 01:09:54 rcx2c309 kernel: [ 68.236746] [c000003c9923b830] [d00000001c794c34] bnx2x_fill_fw_str+0x64/0x140 [bnx2x]
Jul 27 01:09:54 rcx2c309 kernel: [ 68.236754] [c000003c9923b8e0] [d00000001c79f2ac] bnx2x_get_drvinfo+0x6c/0x100 [bnx2x]
Jul 27 01:09:54 rcx2c309 kernel: [ 68.236761] [c000003c9923b910] [d00000001e34f9b0] netdevice_event+0xc0/0x350 [ib_core]
Jul 27 01:09:54 rcx2c309 kernel: [ 68.236765] [c000003c9923ba90] [c0000000000dbce8] notifier_call_chain+0x98/0x100
Jul 27 01:09:54 rcx2c309 kernel: [ 68.236767] [c000003c9923bae0] [c0000000008b796c] call_netdevice_notifiers_info+0x5c/0xb0
Jul 27 01:09:54 rcx2c309 kernel: [ 68.236770] [c000003c9923bb60] [c0000000008bde48] netdev_state_change+0x48/0x80
Jul 27 01:09:54 rcx2c309 kernel: [ 68.236772] [c000003c9923bba0] [c0000000008db014] linkwatch_do_dev+0x74/0xd0
Jul 27 01:09:54 rcx2c309 kernel: [ 68.236773] [c000003c9923bbd0] [c0000000008db54c] __linkwatch_run_queue+0x14c/0x270
Jul 27 01:09:54 rcx2c309 kernel: [ 68.236775] [c000003c9923bc40] [c0000000008db6b4] linkwatch_event+0x44/0x60
Jul 27 01:09:54 rcx2c309 kernel: [ 68.236778] [c000003c9923bc60] [c0000000000d291c] process_one_work+0x19c/0x480
Jul 27 01:09:54 rcx2c309 kernel: [ 68.236780] [c000003c9923bcf0] [c0000000000d31c0] worker_thread+0x190/0x5b0
Jul 27 01:09:54 rcx2c309 kernel: [ 68.236782] [c000003c9923bd80] [c0000000000da4f4] kthread+0x114/0x140
Jul 27 01:09:54 rcx2c309 kernel: [ 68.236785] [c000003c9923be30] [c00000000000956c] ret_from_kernel_thread+0x5c/0x70
Jul 27 01:09:56 rcx2c309 kernel: [ 70.038711] pnv_ioda_unfreeze_pe: Failure -6 clear 1 on PHB#1-PE#1
Jul 27 01:09:56 rcx2c309 kernel: [ 70.038713] eeh_pci_enable: Unexpected state change 2 on PHB#1-PE#1, err=-5
Jul 27 01:09:56 rcx2c309 kernel: [ 70.038937] pnv_ioda_unfreeze_pe: Failure -6 clear 2 on PHB#1-PE#1
Jul 27 01:09:56 rcx2c309 kernel: [ 70.038938] eeh_pci_enable: Unexpected state change 3 on PHB#1-PE#1, err=-5
Jul 27 01:09:56 rcx2c309 kernel: [ 70.038940] EEH: Notify device drivers the completion of reset
Jul 27 01:09:56 rcx2c309 kernel: [ 70.038943] bnx2x: [bnx2x_io_slot_reset:13737(eth0)]IO slot reset initializing...
Jul 27 01:09:56 rcx2c309 kernel: [ 70.039706] EEH: Frozen PHB#1-PE#1 detected
Jul 27 01:09:56 rcx2c309 kernel: [ 70.039733] EEH: PE location: N/A, PHB location: N/A
Jul 27 01:09:56 rcx2c309 kernel: [ 70.039767] CPU: 9 PID: 812 Comm: eehd Tainted: G OE 3.19.0-23-generic #24~14.04.1-Ubuntu
Jul 27 01:09:56 rcx2c309 kernel: [ 70.039768] Call Trace:
Jul 27 01:09:56 rcx2c309 kernel: [ 70.039770] [c000003ca1e6f840] [c000000000a26690] dump_stack+0x90/0xbc (unreliable)
Jul 27 01:09:56 rcx2c309 kernel: [ 70.039772] [c000003ca1e6f870] [c000000000036d74] eeh_dev_check_failure+0x544/0x560
Jul 27 01:09:56 rcx2c309 kernel: [ 70.039775] [c000003ca1e6f910] [c000000000076c9c] pnv_pci_read_config+0x13c/0x1a0
Jul 27 01:09:56 rcx2c309 kernel: [ 70.039778] [c000003ca1e6f960] [c000000000561204] pci_bus_read_config_word+0xc4/0x110
Jul 27 01:09:56 rcx2c309 kernel: [ 70.039781] [c000003ca1e6f9c0] [c00000000056f574] pci_enable_device_flags+0x174/0x1a0
Jul 27 01:09:56 rcx2c309 kernel: [ 70.039790] [c000003ca1e6fa10] [d00000001c761dc4] bnx2x_io_slot_reset+0x94/0x570 [bnx2x]
Jul 27 01:09:56 rcx2c309 kernel: [ 70.039792] [c000003ca1e6fad0] [c00000000003ab04] eeh_report_reset+0x104/0x140
Jul 27 01:09:56 rcx2c309 kernel: [ 70.039793] [c000003ca1e6fb10] [c0000000000395c8] eeh_pe_dev_traverse+0x98/0x170
Jul 27 01:09:56 rcx2c309 kernel: [ 70.039795] [c000003ca1e6fba0] [c00000000003b584] eeh_handle_normal_event+0x334/0x410
Jul 27 01:09:56 rcx2c309 kernel: [ 70.039797] [c000003ca1e6fc20] [c00000000003b968] eeh_handle_event+0x188/0x340
Jul 27 01:09:56 rcx2c309 kernel: [ 70.039799] [c000003ca1e6fcd0] [c00000000003bce8] eeh_event_handler+0x1c8/0x1d0
Jul 27 01:09:56 rcx2c309 kernel: [ 70.039801] [c000003ca1e6fd80] [c0000000000da4f4] kthread+0x114/0x140
Jul 27 01:09:56 rcx2c309 kernel: [ 70.039803] [c000003ca1e6fe30] [c00000000000956c] ret_from_kernel_thread+0x5c/0x70
Jul 27 01:09:56 rcx2c309 kernel: [ 70.054577] pci_raw_set_power_state: 33 callbacks suppressed
Jul 27 01:09:56 rcx2c309 kernel: [ 70.054580] bnx2x 0001:01:00.0: Refused to change power state, currently in D3
Jul 27 01:09:56 rcx2c309 kernel: [ 70.114605] bnx2x: [bnx2x_io_slot_reset:13797(eth0)]pci_cleanup_aer_uncorrect_error_status failed
Jul 27 01:09:56 rcx2c309 kernel: [ 70.114817] bnx2x: [bnx2x_io_slot_reset:13737(eth1)]IO slot reset initializing...
Jul 27 01:09:56 rcx2c309 kernel: [ 70.130577] bnx2x 0001:01:00.1: Refused to change power state, currently in D3
Jul 27 01:09:56 rcx2c309 kernel: [ 70.214576] bnx2x: [bnx2x_io_slot_reset:13753(eth1)]IO slot reset --> driver unload
Jul 27 01:09:56 rcx2c309 kernel: [ 70.214790] Unable to handle kernel paging request for data at address 0xd0000801827fffff
Jul 27 01:09:56 rcx2c309 kernel: [ 70.214965] Faulting instruction address: 0xd00000001c742a70
Jul 27 01:09:56 rcx2c309 kernel: [ 70.215007] Oops: Kernel access of bad area, sig: 11 [#1]
Jul 27 01:09:56 rcx2c309 kernel: [ 70.215039] SMP NR_CPUS=2048 NUMA PowerNV
Jul 27 01:09:56 rcx2c309 kernel: [ 70.215074] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_CHECKSUM iptable_mangle xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter ip_tables x_tables ast ttm joydev mac_hid hid_generic usbhid at24 ipmi_powernv powernv_rng ipmi_msghandler uio_pdrv_genirq drm_kms_helper uio hid drm syscopyarea sysfillrect sysimgblt i2c_algo_bit nfsd auth_rpcgss nfs_acl nfs lockd knem(OE) grace sunrpc fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) configfs ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_ib(OE) mlx5_core(OE) mlx4_ib(OE) ib_sa(OE) ib_mad(OE) ib_core(OE) ib_addr(OE) mlx4_en(OE) vxlan ip6_udp_tunnel udp_tunnel mlx4_core(OE) mlx_compat(OE) uas usb_storage bnx2x ahci libahci mdio libcrc32c
Jul 27 01:09:56 rcx2c309 kernel: [ 70.215777] CPU: 9 PID: 812 Comm: eehd Tainted: G OE 3.19.0-23-generic #24~14.04.1-Ubuntu
Jul 27 01:09:56 rcx2c309 kernel: [ 70.215834] task: c000003ca0139100 ti: c000003ca1e6c000 task.ti: c000003ca1e6c000
Jul 27 01:09:56 rcx2c309 kernel: [ 70.216017] NIP: d00000001c742a70 LR: d00000001c742a50 CTR: c000000000036d90
Jul 27 01:09:56 rcx2c309 kernel: [ 70.216066] REGS: c000003ca1e6f710 TRAP: 0300 Tainted: G OE (3.19.0-23-generic)
Jul 27 01:09:56 rcx2c309 kernel: [ 70.216122] MSR: 9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 28008084 XER: 00000000
Jul 27 01:09:56 rcx2c309 kernel: [ 70.216246] CFAR: c000000000036e24 DAR: d0000801827fffff DSISR: 40000000 SOFTE: 1
Jul 27 01:09:56 rcx2c309 kernel: [ 70.216246] GPR00: d00000001c742a50 c000003ca1e6f990 d00000001c809348 d0000801827fffff
Jul 27 01:09:56 rcx2c309 kernel: [ 70.216246] GPR04: 0000000000000001 c000003ca1e6f970 9000000100009033 0000000000000001
Jul 27 01:09:56 rcx2c309 kernel: [ 70.216246] GPR08: 0000000000000000 0000000000000000 0000000000000000 d00000001c7d2030
Jul 27 01:09:56 rcx2c309 kernel: [ 70.216246] GPR12: 0000000000008800 c00000000fb85100 c0000000000da3e8 c000001fe2931980
Jul 27 01:09:56 rcx2c309 kernel: [ 70.216246] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Jul 27 01:09:56 rcx2c309 kernel: [ 70.216246] GPR20: 0000000000000000 0000000000000000 0000000000000000 c000000000c51108
Jul 27 01:09:56 rcx2c309 kernel: [ 70.216246] GPR24: c000000000c510e0 0000000000100100 c000001fe25d0000 c000001fe25d0000
Jul 27 01:09:56 rcx2c309 kernel: [ 70.216246] GPR28: ffffffffffffffff 0000000000000033 00000000ffffffff c000001fe198c900
Jul 27 01:09:56 rcx2c309 kernel: [ 70.217034] NIP [d00000001c742a70] bnx2x_init_shmem+0x180/0x1f0 [bnx2x]
Jul 27 01:09:56 rcx2c309 kernel: [ 70.217081] LR [d00000001c742a50] bnx2x_init_shmem+0x160/0x1f0 [bnx2x]
Jul 27 01:09:56 rcx2c309 kernel: [ 70.217122] Call Trace:
Jul 27 01:09:56 rcx2c309 kernel: [ 70.217145] [c000003ca1e6f990] [d00000001c742a50] bnx2x_init_shmem+0x160/0x1f0 [bnx2x] (unreliable)
Jul 27 01:09:56 rcx2c309 kernel: [ 70.217217] [c000003ca1e6fa10] [d00000001c761f48] bnx2x_io_slot_reset+0x218/0x570 [bnx2x]
Jul 27 01:09:56 rcx2c309 kernel: [ 70.217274] [c000003ca1e6fad0] [c00000000003ab04] eeh_report_reset+0x104/0x140
Jul 27 01:09:56 rcx2c309 kernel: [ 70.217331] [c000003ca1e6fb10] [c0000000000395c8] eeh_pe_dev_traverse+0x98/0x170
Jul 27 01:09:56 rcx2c309 kernel: [ 70.217389] [c000003ca1e6fba0] [c00000000003b584] eeh_handle_normal_event+0x334/0x410
Jul 27 01:09:56 rcx2c309 kernel: [ 70.217445] [c000003ca1e6fc20] [c00000000003b968] eeh_handle_event+0x188/0x340
Jul 27 01:09:56 rcx2c309 kernel: [ 70.217502] [c000003ca1e6fcd0] [c00000000003bce8] eeh_event_handler+0x1c8/0x1d0
Jul 27 01:09:56 rcx2c309 kernel: [ 70.217558] [c000003ca1e6fd80] [c0000000000da4f4] kthread+0x114/0x140
Jul 27 01:09:56 rcx2c309 kernel: [ 70.217608] [c000003ca1e6fe30] [c00000000000956c] ret_from_kernel_thread+0x5c/0x70
Jul 27 01:09:56 rcx2c309 kernel: [ 70.217798] Instruction dump:
Jul 27 01:09:56 rcx2c309 kernel: [ 70.217825] 40820014 792a07e1 4182000c 4808f5e5 e8410018 893f0033 e87f0020 939f0928
Jul 27 01:09:56 rcx2c309 kernel: [ 70.217917] 79291768 7fde4a14 7c63f214 7c0004ac <81230000> 0c090000 4c00012c 2f89ffff
Jul 27 01:09:56 rcx2c309 kernel: [ 70.218009] ---[ end trace 8d49f86574f73f94 ]---
Jul 27 01:09:56 rcx2c309 kernel: [ 70.218041]
Userspace tool common name: EEH
The userspace tool has the following bit modes: ppc64le
Userspace rpm: EEH
Userspace tool obtained from project website: na
*Additional Instructions for mputtash@xxxxxxxxxx:
-Post a private note with access information to the machine that the bug is occuring on.
-Attach ltrace and strace of userspace application.
== Comment: #8 - Guo Wen Shan <gwshan@xxxxxxxxxxx> - 2015-08-06 21:01:21 ==
Manvanthara, please catch me through sametime to provide the machine access info so that I can debug it and come up with patch to fix it, thanks!
== Comment: #10 - Mukesh K. Ojha <mukeojha@xxxxxxxxxx> - 2015-08-18 04:50:19 ==
Hi All,
Any update on this issue?
== Comment: #13 - Guo Wen Shan <gwshan@xxxxxxxxxxx> - 2015-08-27 20:38:18 ==
Actually, Manvanthara is reporting two different issues from comment#0 and comment#7. I'm looking at the problem reported from comment#7, which can be reproduced with 4.2.rc8 (upstream kernel). I think we might open another bug to trace the issue from comment#7 and let this bug track the issue from comment#0 if Manvanthara agree, as they're different issue from my perspective, thanks!
== Comment: #14 - Guo Wen Shan <gwshan@xxxxxxxxxxx> - 2015-08-27 22:03:16 ==
One patch was sent to community for review, which is tracked by following link. Also, I installed one private kernel that was built from 4.2.rc8 + the patch. EEH error can be recovered successfully without problem. The kernel can be selected from petiboot menu "Ubuntu, with Linux 4.2.0-rc8gavin+" in case any body want to have a try, thanks!
https://patchwork.ozlabs.org/patch/511744/ ("powerpc/eeh: Fix fenced
PHB caused by eeh_slot_error_detail()")
== Comment: #15 - Guo Wen Shan <gwshan@xxxxxxxxxxx> - 2015-08-27 23:42:16 ==
Please ignore the part of "there're different issues" on comment 13. It should be corrected as: they are same issues. So we don't need open another bug at all. Sorry for those stupid confusion :-)
== Comment: #16 - Guo Wen Shan <gwshan@xxxxxxxxxxx> - 2015-08-27 23:43:36 ==
I was told by Michael Ellerman the patch will be put into 4.3.rc3. Closing it as "fixed".
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1522071/+subscriptions