← Back to team overview

kernel-packages team mailing list archive

[Bug 1487085] Re: Ubuntu 14.04.3 LTS Crash in notifier_call_chain after boot

 

** Changed in: linux (Ubuntu)
       Status: New => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1487085

Title:
  Ubuntu 14.04.3 LTS Crash in notifier_call_chain after boot

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Vivid:
  Fix Committed

Bug description:
  SRU Justification:
  [Impact]
  Users of 3.19 kernel with power8 machines get a kernel crash on boot.

  [Test Case]
  Boot system.

  [Fix]
  commit 792f96e9a769b799a2944e9369e4ea1e467135b2 needed to be backported in addition to d7cf83fcaf1b1668201eae4cdd6e6fe7a2448654. Our 3.19 kernel had a partial backport of the first patch.

  --

  
  ---Problem Description---
  Installed Ubuntu 14.04.3 LTS on Palmetto and its crashing after booting to login.
  This happens every time I boot Ubuntu 14.04.3 LTS.  I've reinstalled Ubuntu and replaced the hard disk as well and re-installed.  Still crashing.

  ---uname output---
  Linux paul40 3.19.0-26-generic #28~14.04.1-Ubuntu SMP Wed Aug 12 14:10:52 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux

  Machine Type = Palmetto

  ---System Hang---
   Ubuntu OS crashes and cannot access host. Must reboot system

  ---Steps to Reproduce---
   Boot system

  Oops output:
   [   33.132376] Unable to handle kernel paging request for data at address 0x200000000000000
      [   33.132565] Faulting instruction address: 0xc0000000000dbc60
      [   33.133422] Oops: Kernel access of bad area, sig: 11 [#1]
      [   33.134410] SMP NR_CPUS=2048 NUMA PowerNV
      [   33.134478] Modules linked in: ast ttm drm_kms_helper joydev mac_hid drm hid_generic usbhid hid syscopyarea sysfillrect sysimgblt i2c_algo_bit ofpart cmdlinepart at24 uio_pdrv_genirq powernv_flash mtd ipmi_powernv powernv_rng opal_prd ipmi_msghandler uio uas usb_storage ahci libahci
      [   33.139112] CPU: 24 PID: 0 Comm: swapper/24 Not tainted 3.19.0-26-generic #28~14.04.1-Ubuntu
      [   33.139943] task: c0000000013cccb0 ti: c000000fff700000 task.ti: c000000001448000
      [   33.141642] NIP: c0000000000dbc60 LR: c0000000000dbd94 CTR: 0000000000000000
      [   33.142605] REGS: c000000fff703980 TRAP: 0300   Not tainted  (3.19.0-26-generic)
      [   33.143417] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 28002888  XER: 00000000
      [   33.144244] CFAR: c000000000008468 DAR: 0200000000000000 DSISR: 40000000 SOFTE: 0
      GPR00: c0000000000dbd94 c000000fff703c00 c00000000144cc00 c0000000015f03c0
      GPR04: 0000000000000007 c0000000015f03b8 ffffffffffffffff 0000000000000000
      GPR08: 0000000000000000 0200000000000000 c00000000006c394 9000000000001003
      GPR12: 0000000000002200 c00000000fb8d800 0000000000000058 0000000000000000
      GPR16: c000000001448000 c000000001448000 c000000001448080 c000000000e9a880
      GPR20: c000000001448080 0000000000000001 0000000000000002 0000000000000012
      GPR24: c000000f1e432200 0000000000000000 0000000000000000 c0000000015f03b8
      GPR28: 0000000000000007 0000000000000000 c0000000015f03c0 ffffffffffffffff
      [   33.157013] NIP [c0000000000dbc60] notifier_call_chain+0x70/0x100
      [   33.157818] LR [c0000000000dbd94] atomic_notifier_call_chain+0x44/0x60
      [   33.162090] Call Trace:
      [   33.162845] [c000000fff703c00] [0000000000000008] 0x8 (unreliable)
      [   33.163644] [c000000fff703c50] [c0000000000dbd94] atomic_notifier_call_chain+0x44/0x60
      [   33.164647] [c000000fff703c90] [c00000000006f2a8] opal_message_notify+0xa8/0x100
      [   33.165476] [c000000fff703d00] [c0000000000dbc88] notifier_call_chain+0x98/0x100
      [   33.167007] [c000000fff703d50] [c0000000000dbd94] atomic_notifier_call_chain+0x44/0x60
      [   33.167816] [c000000fff703d90] [c00000000006f654] opal_do_notifier.part.5+0x74/0xa0
      [   33.172166] [c000000fff703dd0] [c00000000006f6d8] opal_interrupt+0x58/0x70
      [   33.172997] [c000000fff703e10] [c0000000001273d0] handle_irq_event_percpu+0x90/0x2b0
      [   33.174507] [c000000fff703ed0] [c000000000127658] handle_irq_event+0x68/0xd0
      [   33.175312] [c000000fff703f00] [c00000000012baf4] handle_fasteoi_irq+0xe4/0x240
      [   33.176124] [c000000fff703f30] [c0000000001265c8] generic_handle_irq+0x58/0x90
      [   33.176936] [c000000fff703f60] [c000000000010f10] __do_irq+0x80/0x190
      [   33.182406] [c000000fff703f90] [c00000000002476c] call_do_irq+0x14/0x24
      [   33.183258] [c00000000144ba30] [c0000000000110c0] do_IRQ+0xa0/0x120
      [   33.184072] [c00000000144ba90] [c0000000000025d8] hardware_interrupt_common+0x158/0x180
      [   33.184907] --- interrupt: 501 at arch_local_irq_restore+0x5c/0x90
      [   33.184907]     LR = arch_local_irq_restore+0x40/0x90
      [   33.186473] [c00000000144bd80] [c000000f2ae19808] 0xc000000f2ae19808 (unreliable)
      [   33.188024] [c00000000144bda0] [c00000000085d5d8] cpuidle_enter_state+0xa8/0x260
      [   33.192695] [c00000000144be00] [c000000000108be8] cpu_startup_entry+0x488/0x4e0
      [   33.193543] [c00000000144bee0] [c00000000000bdb4] rest_init+0xa4/0xc0
      [   33.194327] [c00000000144bf00] [c000000000da3e80] start_kernel+0x53c/0x558
      [   33.195084] [c00000000144bf90] [c000000000008c6c] start_here_common+0x20/0xa8
      [   33.196569] Instruction dump:
      [   33.196619] 7cfd3b78 60000000 60000000 e93e0000 2fa90000 419e00a4 2fbf0000 419e009c
      [   33.197605] 2e3d0000 60000000 60000000 60420000 <e9490000> ebc90008 7d234b78 7f84e378
      [   33.202763] ---[ end trace 71076895a9f126ba ]---
      [   33.202836]
      [   35.203605] Kernel panic - not syncing: Fatal exception in interrupt
      [   35.203727] drm_kms_helper: panic occurred, switching back to text console
      [   35.204692] ---[ end Kernel panic - not syncing: Fatal exception in interrupt

  Ah! This is due to notifier chain array overflow while handling opal
  message. The upstream commit 792f96e fixes this issue.. But what I see
  is the commit 792f96e has been partially applied to ubuntu 14.04.3
  kernel sources. And hence you are seeing this issue.

  commit 792f96e9a769b799a2944e9369e4ea1e467135b2
  Author: Neelesh Gupta <neelegup@xxxxxxxxxxxxxxxxxx>
  Date:   Wed Feb 11 11:57:06 2015 +0530

      powerpc/powernv: Fix the overflow of OPAL message notifiers head
  array

      Fixes the condition check of incoming message type which can
      otherwise shoot beyond the message notifiers head array.

      Signed-off-by: Neelesh Gupta <neelegup@xxxxxxxxxxxxxxxxxx>
      Reviewed-by: Vasant Hegde <hegdevasant@xxxxxxxxxxxxxxxxxx>
      Reviewed-by: Anshuman Khandual <khandual@xxxxxxxxxxxxxxxxxx>
      Signed-off-by: Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx>

  Below is the hunk from above commit, which is missing from ubuntu 14.04.3:
  ------------------------------------------------
  @@ -354,7 +350,7 @@ static void opal_handle_message(void)
          type = be32_to_cpu(msg.msg_type);

          /* Sanity check */
  -       if (type > OPAL_MSG_TYPE_MAX) {
  +       if (type >= OPAL_MSG_TYPE_MAX) {
                  pr_warning("%s: Unknown message type: %u\n", __func__, type);
                  return;
          }
  ------------------------------------------------

  I just checked. The above hunk can be cleanly applied to ubuntu
  14.04.3 kernel sources.  We should mirror this bug to ubuntu and ask
  them to apply above hunk.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1487085/+subscriptions