← Back to team overview

group.of.nepali.translators team mailing list archive

[Bug 1652018] Re: PowerNV: PCI Slot is invalid after fencedPHB Error injection

 

** Also affects: linux (Ubuntu Xenial)
   Importance: Undecided
       Status: New

** Also affects: linux (Ubuntu Yakkety)
   Importance: Undecided
       Status: New

** Changed in: linux (Ubuntu Xenial)
       Status: New => Fix Committed

** Changed in: linux (Ubuntu Yakkety)
       Status: New => Fix Committed

-- 
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1652018

Title:
  PowerNV: PCI Slot is invalid after fencedPHB Error injection

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  Fix Committed
Status in linux source package in Yakkety:
  Fix Committed

Bug description:
  == Comment: #0 - Pridhiviraj Paidipeddi <ppaidipe@xxxxxxxxxx> - 2016-12-21 01:16:41 ==
  ---Problem Description---
  PCI Slot is in invalid state after fencedPHB Error injection Test.
   
  Contact Information = ppaidipe@xxxxxxxxxx 
   
  ---uname output---
  Linux brigstrat1p1 4.4.0-57-generic #78-Ubuntu SMP Fri Dec 9 23:46:13 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux
   
  Machine Type = PowerNV CSE-829U 
   
  ---Debugger---
  A debugger is not configured
   
  ---Steps to Reproduce---
   1. Boot the system to runtime.
  2. Inject fencedPHB Error.
  echo 0x8000000000000000 > /sys/kernel/debug/powerpc/PCI0002/err_injct_outbound

  
  dmesg:
  [42725.641368] EEH: PHB#2 failure detected, location: N/A
  [42725.641450] CPU: 8 PID: 898 Comm: kworker/u320:1 Not tainted 4.4.0-57-generic #78-Ubuntu
  [42725.641461] Workqueue: i40e i40e_service_task [i40e]
  [42725.641464] Call Trace:
  [42725.641469] [c00000000407f9e0] [c000000000b13b4c] dump_stack+0xb0/0xf0 (unreliable)
  [42725.641474] [c00000000407fa20] [c0000000000376e0] eeh_dev_check_failure+0x200/0x580
  [42725.641477] [c00000000407fac0] [c000000000037ae4] eeh_check_failure+0x84/0xd0
  [42725.641485] [c00000000407fb00] [d000000035845710] i40e_service_task+0x17b0/0x1a30 [i40e]
  [42725.641489] [c00000000407fc50] [c0000000000dde10] process_one_work+0x1e0/0x5a0
  [42725.641492] [c00000000407fce0] [c0000000000de364] worker_thread+0x194/0x680
  [42725.641496] [c00000000407fd80] [c0000000000e6e60] kthread+0x110/0x130
  [42725.641499] [c00000000407fe30] [c000000000009538] ret_from_kernel_thread+0x5c/0xa4
  [42725.641509] EEH: Detected error on PHB#2
  [42725.641514] EEH: This PCI device has failed 1 times in the last hour
  [42725.641516] EEH: Notify device drivers to shutdown
  [42725.641523] i40e 0002:01:00.0: i40e_pci_error_detected: error 2
  [42725.641907] i40e 0002:01:00.0: VSI seid 396 Tx ring 0 disable timeout
  [42725.642144] i40e 0002:01:00.0: VSI seid 396 Rx ring 0 disable timeout
  [42725.666205] i40e 0002:01:00.1: i40e_pci_error_detected: error 2
  [42725.666499] i40e 0002:01:00.2: i40e_pci_error_detected: error 2
  [42725.666533] i40e 0002:01:00.0: ARQ event error -32
  [42725.666601] i40e 0002:01:00.3: i40e_pci_error_detected: error 2
  [42725.666700] EEH: Collect temporary log
  [42725.666702] PHB3 PHB#2 Diag-data (Version: 1)
  [42725.666703] brdgCtl:     0000ffff
  [42725.666704] UtlSts:      00100000 00000000 00000000
  [42725.666706] RootSts:     ffffffff ffffffff ffffffff ffffffff 0000ffff
  [42725.666707] RootErrSts:  ffffffff ffffffff ffffffff
  [42725.666708] RootErrLog:  ffffffff ffffffff ffffffff ffffffff
  [42725.666709] RootErrLog1: ffffffff 0000000000000000 0000000000000000
  [42725.666711] nFir:        0000808000000000 0030006e00000000 0000800000000000
  [42725.666712] PhbSts:      0000001800000000 0000001800000000
  [42725.666713] Lem:         8000020000800000 42498e367f502eae 8000000000000000
  [42725.666715] OutErr:      8000002000000000 8000000000000000 120800600003fffe 402002a800000000
  [42725.666716] InBErr:      0000000040000000 0000000040000000 0000080000000000 000c10c010010000
  [42725.666718] EEH: Reset without hotplug activity
  [42730.052455] EEH: Notify device drivers the completion of reset
  [42730.053334] EEH: Notify device driver to resume
  [42730.184457] i40e 0002:01:00.0 enP2p1s0f0: NIC Link is Down
  [42731.568230] i40e 0002:01:00.0 enP2p1s0f0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None

  
  OPAL LOG:
  [42990.475630456,7] PHB#0002: CRESET: Starts
  [42990.482717333,7] PHB#0002: CRESET: No pending transactions
  [42991.023963215,7] PHB#0002: CRESET: Reinitialization
  [42991.023964143,7] PHB#0002: Initializing PHB...
  [42991.075167078,7] PHB#0002: Core revision 0xa30005
  [42991.075171529,7] PHB#0002: Default system config: 0x421100fc30000000
  [42991.075172655,7] PHB#0002: New system config    : 0x421000fc30000000
  [42991.075174000,7] PHB#0002: PHB_RESET is 0x2000000000000000
  [42991.075410938,7] PHB#0002: Waiting for DLP PG reset to complete...
  [42991.083713914,7] PHB#0002: Initialization complete
  [42991.136599535,7] PHB#0002: FRESET: Starts
  [42991.136600954,7] PHB#0002: FRESET: Prepare for link down
  [42991.136602933,7] PHB#0002: FRESET: Assert
  [42992.138625290,7] PHB#0002: FRESET: Deassert
  [42993.140657592,7] PHB#0002: LINK: Start polling
  [42993.193893558,7] PHB#0002: LINK: Electrical link detected
  [42993.247138072,7] PHB#0002: LINK: Link is up
  [42993.247174237,3] PCI-SLOT-0000000000000002 Invalid state 00000000

  == Comment: #2 - VIPIN K. PARASHAR <viparash@xxxxxxxxxx> - 2016-12-22
  04:57:28 ==

  $ git log fbce44d0ed42e465317 -1
  commit fbce44d0ed42e4653172376f4dfeaa5710f06a27
  Author: Gavin Shan <gwshan@xxxxxxxxxxxxxxxxxx>
  Date:   Fri Jun 24 16:44:19 2016 +1000

      powerpc/powernv: Call opal_pci_poll() if needed
      
      When issuing PHB reset, OPAL API opal_pci_poll() is called to drive
      the state machine in OPAL forward. However, we needn't always call
      the function under some circumstances like reset deassert.
      
      This avoids calling opal_pci_poll() when OPAL_SUCCESS is returned
      from opal_pci_reset(). Except the overhead introduced by additional
      one unnecessary OPAL call, I didn't run into real issue because of
      this.
      
      Reported-by: Pridhiviraj Paidipeddi <ppaiddipe@xxxxxxxxxx>
      Signed-off-by: Gavin Shan <gwshan@xxxxxxxxxxxxxxxxxx>
      Signed-off-by: Michael Ellerman <mpe@xxxxxxxxxxxxxx>

  $ git tag --contains  fbce44d0e
  v4.9
  v4.9-rc1
  v4.9-rc2
  v4.9-rc3
  v4.9-rc4
  v4.9-rc5
  v4.9-rc6
  v4.9-rc7
  v4.9-rc8
  $ 
    
  This issue is fixed by commit # fbce44d0ed4, available in kernel version 4.9.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1652018/+subscriptions