← Back to team overview

kernel-packages team mailing list archive

[Bug 1486180] Comment bridged from LTC Bugzilla

 

------- Comment From chavez@xxxxxxxxxx 2016-01-11 10:34 EDT-------
Status update:

The root cause was found, and a patch is provided.
The problem happens when DLPAR of PCI device is done in LPAR with no PCI devices present at boot time. When DDW is being enabled (in function query_ddw() specifically), a NULL pointer dereference happens because a member of struct eeh_dev is NULL.

This is caused because EEH is not initialized correctly, by not probing
PCI devices as expected, and so not initializing the eeh_dev struct.

The commit 89a51df5ab1d ("powerpc/eeh: Fix crash in
eeh_add_device_early() on Cell") added a check to avoid oops in Cell
architecture in function eeh_add_device_early() - this function is used
to probe PCI devices in hotplug/DLPAR operation. The check is performed
by evaluating the return of eeh_enable() function.

The issue then happens because since we have no PCI device on boot time,
EEH is not enabled and this check fails on eeh_add_device_early(). Our
patch changes the way the arch checking is done, and so this bug does
not happen anymore.

The patch was submitted upstream. I don't know exactly the procedure  regarding Canonical - I think we should wait the upstream acceptance and then request Canonical to add the patch to Ubuntu's 14.04.4/15.10/16.04 kernel.
The patch's description provides a bit more details of the issue and the proposed solution.

Link to patch on ppc-dev list: https://lists.ozlabs.org/pipermail
/linuxppc-dev/2016-January/137695.html

Thanks Shryia for all the help provided.
Cheers,

Guilherme

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1486180

Title:
  Kernel OOPS during DLPAR operation with Fibre Channel adapter

Status in linux package in Ubuntu:
  New

Bug description:
  -- Problem Description --
  Kernel OOPS during DLPAR operation with Fibre Channel adapter
   
  ---uname output---
  4.1.0-1-generic
   
  ---Additional Hardware Info---
  Fibre Channel: Emulex Corporation Saturn-X: LightPulse Fibre Channel Host Adapter (rev 03) 
   
  Machine Type = POWER8 
    
  ---Steps to Reproduce---
  1) Install Ubuntu 15.10 on a Power VM LPAR.
  2) Configure and start rtas_errd daemon
  3) Via HMC try to add a Fibre channel adapter via dynamic partitioning
   During the operation following OOPS message is observed

  Oops output:

   !!! 00E0806 Fcode, Copyright (c) 2000-2012 Emulex !!!  Version 3.10x2

  
  !!! 00E0806 Fcode, Copyright (c) 2000-2012 Emulex !!!  Version 3.10x2
  [ 8696.808703] PCI host bridge /pci@800000020000020  ranges:
  [ 8696.808708]  MEM 0x0003ff8400000000..0x0003ff847effffff -> 0x0000000080000000 
  [ 8696.808716] PCI: I/O resource not set for host bridge /pci@800000020000020 (domain 1)
  [ 8696.808761] PCI host bridge to bus 0001:01
  [ 8696.808765] pci_bus 0001:01: root bus resource [mem 0x3ff8400000000-0x3ff847effffff] (bus address [0x80000000-0xfeffffff])
  [ 8696.808768] pci_bus 0001:01: root bus resource [bus 01-ff]
  [ 8696.897390] rpaphp: Slot [U78C7.001.RCH0042-P1-C8] registered
  [ 8696.897395] rpadlpar_io: slot PHB 32 added
  [ 8696.972155] Emulex LightPulse Fibre Channel SCSI driver 10.5.0.0.
  [ 8696.972157] Copyright(c) 2004-2015 Emulex.  All rights reserved.
  [ 8696.972438] lpfc 0001:01:00.1: enabling device (0140 -> 0142)
  [ 8696.976145] Unable to handle kernel paging request for data at address 0x0000000c
  [ 8696.976174] Faulting instruction address: 0xc000000000084cc4
  [ 8696.976182] Oops: Kernel access of bad area, sig: 11 [#1]
  [ 8696.976188] SMP NR_CPUS=2048 NUMA pSeries
  [ 8696.976196] Modules linked in: lpfc(+) scsi_transport_fc rpadlpar_io rpaphp rtc_generic pseries_rng autofs4
  [ 8696.976220] CPU: 3 PID: 1426 Comm: systemd-udevd Not tainted 4.1.0-1-generic #1~dogfoodv1-Ubuntu
  [ 8696.976230] task: c0000003857737e0 ti: c0000000fd08c000 task.ti: c0000000fd08c000
  [ 8696.976239] NIP: c000000000084cc4 LR: c000000000084ca8 CTR: 0000000000000000
  [ 8696.976247] REGS: c0000000fd08f0f0 TRAP: 0300   Not tainted  (4.1.0-1-generic)
  [ 8696.976255] MSR: 8000000100009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 82228888  XER: 20000000
  [ 8696.976278] CFAR: c000000000008468 DAR: 000000000000000c DSISR: 40000000 SOFTE: 1 
                 GPR00: c000000000084ca8 c0000000fd08f370 c0000000014bda00 0000000000000000 
                 GPR04: 0000000000000001 c0000000fd08f408 0000000000000003 d000000002c31e60 
                 GPR08: c0000000013bda00 0000000000000000 c0000003873e6b80 d000000002ca7c98 
                 GPR12: 0000000000008800 c00000000e831b00 d0000000029421f8 00003ffff8ca4522 
                 GPR16: c0000000fd08fdc0 c0000000fd08fe04 d000000002941878 c0000000fc8054c0 
                 GPR20: d000000002380000 d000000002380000 d000000002ccff90 0000000000000000 
                 GPR24: c00000000165074c c00000038e17e000 c0000000013b5e00 c00000038e17e000 
                 GPR28: c0000000013b5e28 c00000000a590600 c0000000013b5df0 c0000000013b5e20 
  [ 8696.976396] NIP [c000000000084cc4] enable_ddw+0x254/0x7b0
  [ 8696.976405] LR [c000000000084ca8] enable_ddw+0x238/0x7b0
  [ 8696.976411] Call Trace:
  [ 8696.976419] [c0000000fd08f370] [c000000000084ca8] enable_ddw+0x238/0x7b0 (unreliable)
  [ 8696.976431] [c0000000fd08f4b0] [c0000000000866d8] dma_set_mask_pSeriesLP+0x218/0x2a0
  [ 8696.976444] [c0000000fd08f540] [c000000000023528] dma_set_mask+0x58/0xa0
  [ 8696.976474] [c0000000fd08f570] [d000000002c71280] lpfc_pci_probe_one+0xb0/0xc50 [lpfc]
  [ 8696.976486] [c0000000fd08f610] [c0000000005987fc] local_pci_probe+0x6c/0x140
  [ 8696.976497] [c0000000fd08f6a0] [c000000000598a28] pci_device_probe+0x158/0x1e0
  [ 8696.976510] [c0000000fd08f700] [c00000000067b744] driver_probe_device+0x1c4/0x5a0
  [ 8696.976522] [c0000000fd08f790] [c00000000067bcdc] __driver_attach+0x11c/0x120
  [ 8696.976533] [c0000000fd08f7d0] [c00000000067854c] bus_for_each_dev+0x9c/0x110
  [ 8696.976544] [c0000000fd08f820] [c00000000067adbc] driver_attach+0x3c/0x60
  [ 8696.976555] [c0000000fd08f850] [c00000000067a768] bus_add_driver+0x208/0x320
  [ 8696.976565] [c0000000fd08f8e0] [c00000000067c99c] driver_register+0x9c/0x180
  [ 8696.976576] [c0000000fd08f950] [c0000000005978ec] __pci_register_driver+0x6c/0x90
  [ 8696.976604] [c0000000fd08f990] [d000000002ca7848] lpfc_init+0x17c/0x1d8 [lpfc]
  [ 8696.976617] [c0000000fd08fa20] [c00000000000b42c] do_one_initcall+0x12c/0x280
  [ 8696.976628] [c0000000fd08faf0] [c000000000a6c7c8] do_init_module+0x98/0x238
  [ 8696.976640] [c0000000fd08fb80] [c000000000163fa4] load_module+0x1354/0x14d0
  [ 8696.976651] [c0000000fd08fd50] [c0000000001643d0] SyS_finit_module+0xc0/0x120
  [ 8696.976662] [c0000000fd08fe30] [c0000000000091fc] system_call+0x38/0xb4
  [ 8696.976669] Instruction dump:
  [ 8696.976675] 7fa3eb78 38842388 38a10090 38c00003 488345a5 60000000 2fa30000 409e0170 
  [ 8696.976694] 2fb90000 419e0438 ea7902f0 e93902e8 <83e9000c> 81490008 2f9f0000 7bff0020 
  [ 8696.976716] ---[ end trace c6f99bed0288dc0c ]---

  The DLPAR operation completes successfully. lspci does display
  following information after this operation

  # lspci
  0001:01:00.0 Fibre Channel: Emulex Corporation Saturn-X: LightPulse Fibre Channel Host Adapter (rev 03)
  0001:01:00.1 Fibre Channel: Emulex Corporation Saturn-X: LightPulse Fibre Channel Host Adapter (rev 03)
  #

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1486180/+subscriptions