← Back to team overview

kernel-packages team mailing list archive

[Bug 1487085] [NEW] Ubuntu 14.04.3 LTS Crash in notifier_call_chain after boot

 

You have been subscribed to a public bug:

---Problem Description---
Installed Ubuntu 14.04.3 LTS on Palmetto and its crashing after booting to login.
This happens every time I boot Ubuntu 14.04.3 LTS.  I've reinstalled Ubuntu and replaced the hard disk as well and re-installed.  Still crashing.
 
---uname output---
Linux paul40 3.19.0-26-generic #28~14.04.1-Ubuntu SMP Wed Aug 12 14:10:52 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux
 
Machine Type = Palmetto 
 
---System Hang---
 Ubuntu OS crashes and cannot access host. Must reboot system
  
---Steps to Reproduce---
 Boot system
  
Oops output:
 [   33.132376] Unable to handle kernel paging request for data at address 0x200000000000000
    [   33.132565] Faulting instruction address: 0xc0000000000dbc60
    [   33.133422] Oops: Kernel access of bad area, sig: 11 [#1]
    [   33.134410] SMP NR_CPUS=2048 NUMA PowerNV
    [   33.134478] Modules linked in: ast ttm drm_kms_helper joydev mac_hid drm hid_generic usbhid hid syscopyarea sysfillrect sysimgblt i2c_algo_bit ofpart cmdlinepart at24 uio_pdrv_genirq powernv_flash mtd ipmi_powernv powernv_rng opal_prd ipmi_msghandler uio uas usb_storage ahci libahci
    [   33.139112] CPU: 24 PID: 0 Comm: swapper/24 Not tainted 3.19.0-26-generic #28~14.04.1-Ubuntu
    [   33.139943] task: c0000000013cccb0 ti: c000000fff700000 task.ti: c000000001448000
    [   33.141642] NIP: c0000000000dbc60 LR: c0000000000dbd94 CTR: 0000000000000000
    [   33.142605] REGS: c000000fff703980 TRAP: 0300   Not tainted  (3.19.0-26-generic)
    [   33.143417] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 28002888  XER: 00000000
    [   33.144244] CFAR: c000000000008468 DAR: 0200000000000000 DSISR: 40000000 SOFTE: 0 
    GPR00: c0000000000dbd94 c000000fff703c00 c00000000144cc00 c0000000015f03c0 
    GPR04: 0000000000000007 c0000000015f03b8 ffffffffffffffff 0000000000000000 
    GPR08: 0000000000000000 0200000000000000 c00000000006c394 9000000000001003 
    GPR12: 0000000000002200 c00000000fb8d800 0000000000000058 0000000000000000 
    GPR16: c000000001448000 c000000001448000 c000000001448080 c000000000e9a880 
    GPR20: c000000001448080 0000000000000001 0000000000000002 0000000000000012 
    GPR24: c000000f1e432200 0000000000000000 0000000000000000 c0000000015f03b8 
    GPR28: 0000000000000007 0000000000000000 c0000000015f03c0 ffffffffffffffff 
    [   33.157013] NIP [c0000000000dbc60] notifier_call_chain+0x70/0x100
    [   33.157818] LR [c0000000000dbd94] atomic_notifier_call_chain+0x44/0x60
    [   33.162090] Call Trace:
    [   33.162845] [c000000fff703c00] [0000000000000008] 0x8 (unreliable)
    [   33.163644] [c000000fff703c50] [c0000000000dbd94] atomic_notifier_call_chain+0x44/0x60
    [   33.164647] [c000000fff703c90] [c00000000006f2a8] opal_message_notify+0xa8/0x100
    [   33.165476] [c000000fff703d00] [c0000000000dbc88] notifier_call_chain+0x98/0x100
    [   33.167007] [c000000fff703d50] [c0000000000dbd94] atomic_notifier_call_chain+0x44/0x60
    [   33.167816] [c000000fff703d90] [c00000000006f654] opal_do_notifier.part.5+0x74/0xa0
    [   33.172166] [c000000fff703dd0] [c00000000006f6d8] opal_interrupt+0x58/0x70
    [   33.172997] [c000000fff703e10] [c0000000001273d0] handle_irq_event_percpu+0x90/0x2b0
    [   33.174507] [c000000fff703ed0] [c000000000127658] handle_irq_event+0x68/0xd0
    [   33.175312] [c000000fff703f00] [c00000000012baf4] handle_fasteoi_irq+0xe4/0x240
    [   33.176124] [c000000fff703f30] [c0000000001265c8] generic_handle_irq+0x58/0x90
    [   33.176936] [c000000fff703f60] [c000000000010f10] __do_irq+0x80/0x190
    [   33.182406] [c000000fff703f90] [c00000000002476c] call_do_irq+0x14/0x24
    [   33.183258] [c00000000144ba30] [c0000000000110c0] do_IRQ+0xa0/0x120
    [   33.184072] [c00000000144ba90] [c0000000000025d8] hardware_interrupt_common+0x158/0x180
    [   33.184907] --- interrupt: 501 at arch_local_irq_restore+0x5c/0x90
    [   33.184907]     LR = arch_local_irq_restore+0x40/0x90
    [   33.186473] [c00000000144bd80] [c000000f2ae19808] 0xc000000f2ae19808 (unreliable)
    [   33.188024] [c00000000144bda0] [c00000000085d5d8] cpuidle_enter_state+0xa8/0x260
    [   33.192695] [c00000000144be00] [c000000000108be8] cpu_startup_entry+0x488/0x4e0
    [   33.193543] [c00000000144bee0] [c00000000000bdb4] rest_init+0xa4/0xc0
    [   33.194327] [c00000000144bf00] [c000000000da3e80] start_kernel+0x53c/0x558
    [   33.195084] [c00000000144bf90] [c000000000008c6c] start_here_common+0x20/0xa8
    [   33.196569] Instruction dump:
    [   33.196619] 7cfd3b78 60000000 60000000 e93e0000 2fa90000 419e00a4 2fbf0000 419e009c 
    [   33.197605] 2e3d0000 60000000 60000000 60420000 <e9490000> ebc90008 7d234b78 7f84e378 
    [   33.202763] ---[ end trace 71076895a9f126ba ]---
    [   33.202836] 
    [   35.203605] Kernel panic - not syncing: Fatal exception in interrupt
    [   35.203727] drm_kms_helper: panic occurred, switching back to text console
    [   35.204692] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
 
Ah! This is due to notifier chain array overflow while handling opal message. The upstream commit 792f96e fixes this issue.. But what I see is the commit 792f96e has been partially applied to ubuntu 14.04.3 kernel sources. And hence you are seeing this issue. 

commit 792f96e9a769b799a2944e9369e4ea1e467135b2
Author: Neelesh Gupta <neelegup@xxxxxxxxxxxxxxxxxx>
Date:   Wed Feb 11 11:57:06 2015 +0530

    powerpc/powernv: Fix the overflow of OPAL message notifiers head array
    
    Fixes the condition check of incoming message type which can
    otherwise shoot beyond the message notifiers head array.
    
    Signed-off-by: Neelesh Gupta <neelegup@xxxxxxxxxxxxxxxxxx>
    Reviewed-by: Vasant Hegde <hegdevasant@xxxxxxxxxxxxxxxxxx>
    Reviewed-by: Anshuman Khandual <khandual@xxxxxxxxxxxxxxxxxx>
    Signed-off-by: Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx>

Below is the hunk from above commit, which is missing from ubuntu 14.04.3:
------------------------------------------------
@@ -354,7 +350,7 @@ static void opal_handle_message(void)
        type = be32_to_cpu(msg.msg_type);
 
        /* Sanity check */
-       if (type > OPAL_MSG_TYPE_MAX) {
+       if (type >= OPAL_MSG_TYPE_MAX) {
                pr_warning("%s: Unknown message type: %u\n", __func__, type);
                return;
        }
------------------------------------------------

I just checked. The above hunk can be cleanly applied to ubuntu 14.04.3
kernel sources.  We should mirror this bug to ubuntu and ask them to
apply above hunk.

** Affects: linux (Ubuntu)
     Importance: Undecided
     Assignee: Taco Screen team (taco-screen-team)
         Status: New


** Tags: architecture-ppc64le bot-comment bugnameltc-129216 severity-high targetmilestone-inin14043
-- 
Ubuntu 14.04.3 LTS Crash in notifier_call_chain after boot
https://bugs.launchpad.net/bugs/1487085
You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu.