group.of.nepali.translators team mailing list archive

Thread
Date
[Bug 1645826] Re: Crash@pcibios_set_pcie_reset_state+0x118/0x280 in capiredp01 with latest level - 160823-GA3-FlashGT

To: group.of.nepali.translators@xxxxxxxxxxxxxxxxxxx
From: Luis Henriques <luis.henriques@xxxxxxxxxxxxx>
Date: Wed, 11 Jan 2017 11:37:47 -0000
Reply-to: Bug 1645826 <1645826@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
** Changed in: linux (Ubuntu Xenial)
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1645826

Title:
  Crash@pcibios_set_pcie_reset_state+0x118/0x280 in capiredp01 with
  latest level - 160823-GA3-FlashGT

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Released

Bug description:
  == Comment: #26 - Andrew Donnellan - 2016-11-24 19:55:52 ==
  Ubuntu kernel team, please apply the following fixup to the Xenial kernel tree.

  --------------------------------------------------------------

  From 631804b1548b035cada4b2c14ab708310a8aa607 Mon Sep 17 00:00:00 2001
  From: Gavin Shan <gwshan@xxxxxxxxxxxxxxxxxx>
  Date: Mon, 12 Sep 2016 10:50:16 +1000
  Subject: [PATCH] powerpc/eeh: Remove EEH_PE_PRI_BUS in full hotplug recovery

  commit 59ae8c6d5b45 ("powerpc/eeh: Fix invalid cached PE primary
  bus") was wrongly backporting upstream commit a3aa256b7258: It
  should clear the PE's flag (EEH_PE_PRI_BUS) in full hotplug instead
  of partial hotplug scenario.

  This fixes the issue by clearing EEH_PE_PRI_BUS in full hotplug
  scenario only.

  Fixes: 59ae8c6d5b45 ("powerpc/eeh: Fix invalid cached PE primary bus")
  Signed-off-by: Gavin Shan <gwshan@xxxxxxxxxxxxxxxxxx>
  ---
   arch/powerpc/kernel/eeh_driver.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

  diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
  index c453b53..829ab8e 100644
  --- a/arch/powerpc/kernel/eeh_driver.c
  +++ b/arch/powerpc/kernel/eeh_driver.c
  @@ -630,13 +630,13 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
   		 * rebuilt when adding PCI devices.
   		 */
   		eeh_pe_traverse(pe, eeh_pe_detach_dev, NULL);
  +		eeh_pe_state_clear(pe, EEH_PE_PRI_BUS);
   		pcibios_add_pci_devices(bus);
   	} else if (frozen_bus && removed) {
   		pr_info("EEH: Sleep 5s ahead of partial hotplug\n");
   		ssleep(5);
   
   		eeh_pe_traverse(pe, eeh_pe_detach_dev, NULL);
  -		eeh_pe_state_clear(pe, EEH_PE_PRI_BUS);
   		pcibios_add_pci_devices(frozen_bus);
   	}
   	eeh_pe_state_clear(pe, EEH_PE_KEEP);
  -- 
  2.1.0

  
  Historical context:
  ==== State: Open by: ukrishn on 08 September 2016 18:15:32 ====

  Seems like this is easily recreatable. Mike Vageline just hit the
  issue by doing couple of PERST on a FlashGT card.

  Here is the note from him -
  I had downloaded 0908, then perst, modprob'd to verify 0908, then rmmod, then perst to factory, modprob'd, verified it was 0903, rmmod, then perst again to user... xmon

  p8tul12-lp1 login: [  647.501340] Fatal Hypervisor Maintenance interrupt [Recovered]
  [  647.501348] EEH: Fenced PHB#2 detected, location: N/A
  [  647.501528]  Error detail: Malfunction Alert
  [  647.501590] 	HMER: 8040000000000000
  [  647.501637] 	Unknown Core check stop.
  [  647.502584] Fatal Hypervisor Maintenance interrupt [Recovered]
  [  647.502588]  Error detail: Malfunction Alert
  [  647.502590] 	HMER: 8040000000000000
  [  647.502591] 	Unknown Core check stop.
  [  665.369299] PCI: Memory resource 0 not set for host bridge /pciex@3fffe40400000/pci@0/device@0 (domain 5)
  [  676.293638] Back level AFU, please upgrade. AFU version 160903N0 interface version 0xffffffffffffffff
  [  676.293842] cxlflash 0005:00:00.0: cxlflash_probe: call to init_afu failed rc=-22!
  [  704.863543] Unable to handle kernel paging request for data at address 0x00000110
  [  704.863673] Faulting instruction address: 0xc000000000083e08
  cpu 0x2: Vector: 300 (Data Access) at [c000000f01cbf7d0]
      pc: c000000000083e08: pnv_eeh_reset+0x68/0x170
      lr: c000000000083df8: pnv_eeh_reset+0x58/0x170
      sp: c000000f01cbfa50
     msr: 9000000000009033
     dar: 110
   dsisr: 40000000
    current = 0xc000000f014bc8e0
    paca    = 0xc000000007b41300	 softe: 0	 irq_happened: 0x01
      pid   = 10688, comm = sh
  enter ? for help
  [c000000f01cbfad0] c000000000038bb8 pcibios_set_pcie_reset_state+0x118/0x280
  [c000000f01cbfb50] c0000000005e9450 pci_set_pcie_reset_state+0x30/0x50
  [c000000f01cbfb80] d000000007c9f7bc cxl_pci_reset+0x5c/0xc0 [cxl]
  [c000000f01cbfbf0] d000000007c992a4 reset_adapter_store+0x84/0x120 [cxl]
  [c000000f01cbfc80] c0000000006d2378 dev_attr_store+0x68/0xa0
  [c000000f01cbfcc0] c000000000398290 sysfs_kf_write+0x80/0xb0
  [c000000f01cbfd00] c0000000003971a8 kernfs_fop_write+0x188/0x200
  [c000000f01cbfd50] c0000000002e1a6c __vfs_write+0x6c/0xe0
  [c000000f01cbfd90] c0000000002e27a0 vfs_write+0xc0/0x230
  [c000000f01cbfde0] c0000000002e37dc SyS_write+0x6c/0x110
  [c000000f01cbfe30] c000000000009204 system_call+0x38/0xb4
  --- Exception: c01 (System Call) at 00003fff9c610eb8
  SP (3fffdeaa0480) is in userspace
  2:mon>

  ==== State: Open by: ukrishn on 09 September 2016 13:11:49 ====

  2:mon> e
  cpu 0x2: Vector: 300 (Data Access) at [c000000f01cbf7d0]
      pc: c000000000083e08: pnv_eeh_reset+0x68/0x170
      lr: c000000000083df8: pnv_eeh_reset+0x58/0x170
      sp: c000000f01cbfa50
     msr: 9000000000009033
     dar: 110
   dsisr: 40000000
    current = 0xc000000f014bc8e0
    paca    = 0xc000000007b41300   softe: 0        irq_happened: 0x01
      pid   = 10688, comm = sh
  2:mon>

  c000000000083df4  4bfb6f25      bl      c00000000003ad18        # eeh_pe_bus_get+0x8/0xe0
  c000000000083df8  60000000      nop
  c000000000083dfc  e9230010      ld      r9,16(r3)
  c000000000083e00  2fa90000      cmpdi   cr7,r9,0
  c000000000083e04  419e00dc      beq     cr7,c000000000083ee0    # pnv_eeh_reset+0x140/0x170
  c000000000083e08  e9290010      ld      r9,16(r9)

  R03 = c0000007f7db4800
  R09 = 0000000000000100

  2:mon> d c0000007f7db4800
  c0000007f7db4800 00f8dbf7070000c0 0000000000000000  |................|
  c0000007f7db4810 0001000000000000  <<<<< This should have either been a null
  or a valid parent pointer.

  As Andrew suspected, this could be a memory corruption and the problem seems to
  be easily recreatable on Ubuntu 4.4.0-36 Xenial kernel. So far, the scenario has
  been that they are doing repeated PERST with unload and reload of cxlflash driver.

  1. unload cxlflash
  2. PERST
  3. modprobe cxlflash

  When the above 3 steps are repeated especially after a new AFU image install,
  this problem seems to be hit.

  
  == Comment: #26 - Andrew Donnellan - 2016-11-24 19:55:52 ==
  Ubuntu kernel team, please apply the following fixup to the Xenial kernel tree.

  --------------------------------------------------------------

  From 631804b1548b035cada4b2c14ab708310a8aa607 Mon Sep 17 00:00:00 2001
  From: Gavin Shan <gwshan@xxxxxxxxxxxxxxxxxx>
  Date: Mon, 12 Sep 2016 10:50:16 +1000
  Subject: [PATCH] powerpc/eeh: Remove EEH_PE_PRI_BUS in full hotplug recovery

  commit 59ae8c6d5b45 ("powerpc/eeh: Fix invalid cached PE primary
  bus") was wrongly backporting upstream commit a3aa256b7258: It
  should clear the PE's flag (EEH_PE_PRI_BUS) in full hotplug instead
  of partial hotplug scenario.

  This fixes the issue by clearing EEH_PE_PRI_BUS in full hotplug
  scenario only.

  Fixes: 59ae8c6d5b45 ("powerpc/eeh: Fix invalid cached PE primary bus")
  Signed-off-by: Gavin Shan <gwshan@xxxxxxxxxxxxxxxxxx>
  ---
   arch/powerpc/kernel/eeh_driver.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

  diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
  index c453b53..829ab8e 100644
  --- a/arch/powerpc/kernel/eeh_driver.c
  +++ b/arch/powerpc/kernel/eeh_driver.c
  @@ -630,13 +630,13 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
   		 * rebuilt when adding PCI devices.
   		 */
   		eeh_pe_traverse(pe, eeh_pe_detach_dev, NULL);
  +		eeh_pe_state_clear(pe, EEH_PE_PRI_BUS);
   		pcibios_add_pci_devices(bus);
   	} else if (frozen_bus && removed) {
   		pr_info("EEH: Sleep 5s ahead of partial hotplug\n");
   		ssleep(5);
   
   		eeh_pe_traverse(pe, eeh_pe_detach_dev, NULL);
  -		eeh_pe_state_clear(pe, EEH_PE_PRI_BUS);
   		pcibios_add_pci_devices(frozen_bus);
   	}
   	eeh_pe_state_clear(pe, EEH_PE_KEEP);
  -- 
  2.1.0

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1645826/+subscriptions