← Back to team overview

kernel-packages team mailing list archive

[Bug 1410519] Re: [PowerVM] Kernel BUG @ kernel/irq_work.c:157! - 24x7 hw counters

 

Thanks Suka,

Chris,

Our major concern is regarding 15.04 for this bug, so, if you see any
problem with this patch for 14.10, we can skip it and fix it in 15.04
only.

Thank you,
Breno

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1410519

Title:
  [PowerVM] Kernel BUG @ kernel/irq_work.c:157!  - 24x7 hw counters

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Utopic:
  In Progress

Bug description:
  [Impact]
  Using perf with hv_24x7 events can cause a kernel BUG.

  [Fix]
  The following upstream commits:
   d658972
   48bee8a
   f34b6c7
   ec2aef5

  [Test Case]
  Steps to recreate the problem:

  1.  Install Ubuntu 15.04 as a PowerVM guest.
  2.  Install perf tool
  3.  Run following scripts to test 24/7 Power8 hardware counter event with perf. tool

  ===  Script 1
  #!/bin/bash

  count=0;

  offset=0x128
  PERF_ARGS="-r 10 -C 0"
  while [ $count -lt 100 ]; do

          EVENT="hv_24x7/domain=0x2,offset=$offset,starting_index=10/"

          perf stat $PERF_ARGS -x ' ' perf stat $PERF_ARGS -x ' ' -e
  $EVENT ls

          count=)
  done

  ==== Script 2
  #!/bin/bash

  offset=0;

  PERF_ARGS="-r 10 -C 0"
  while [ $offset -lt 8192 ]; do

          EVENT="hv_24x7/domain=0x2,offset=$offset,starting_index=10/"

          perf stat $PERF_ARGS -x ' ' perf stat $PERF_ARGS -x ' ' -e
  $EVENT ls

          offset=)
  done

  After few iterations I hit the following BUG.

  tt2.sh  tt.sh
  tt2.sh  tt.sh
  tt2.sh  tt.sh
  275679187521558  hv_24x7/domain=0x2,offset=6848,starting_index=10/ 0.00%
  tt2.sh  tt.sh
  [ 4657.314709] softirq: huh, entered softirq 7 SCHED c00000000010abc0 with preem
  pt_count 00000100, exited with bfff0000?
  [ 4657.314727] kernel BUG at /build/buildd/linux-3.16.0/kernel/irq_work.c:157!
  [ 4657.314732] Oops: Exception in kernel mode, sig: 5 [#1]
  [ 4657.314740] Modules linked in: rtc_generic pseries_rng
  [ 4657.314749] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-25-generic #33-U
  [ 4657.314755] task: c000000001375e00 ti: c0000000013d0000 task.ti: c0000000013d0000
  [ 4657.314759] NIP: c0000000001e8ffc LR: c00000000001fe70 CTR: c000000000002800ic)
  [ 4657.314770] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28042024  XER: 0000000a
  [ 4657.314782] CFAR: c00000000001fe6c SOFTE: 0
  GPR04: 0000000000000010 00000000009c0000 c000000001424a98 0000000000000002
  GPR12: 8000000000009033 c00000000e9a0000 0000000006a3fcd0 0000000000000060
  GPR16: 0000000000200000 0000000000000000 c000000000e57c00 0000000000000000
  GPR20: c000000001595dca c000000001595478 0000000000000001 000000000000ffff
  GPR28: c000000000e40380 c000000000e40300 c0000000013d3590 c000000000e56f08
  [ 4657.314832] NIP [c0000000001e8ffc] irq_work_run+0x1c/0x30
  [ 4657.314841] Call Trace:
  4000 (unreliable)
  [ 4657.314861] [c0000000013d34f0] [c00000000001ff90] timer_interrupt+0xa0/0xe0
  [ 4657.314871] [c0000000013d3520] [c000000000002914] decrementer_common+0x114/0x180
  [ 4657.314884] --- Exception: 901 at arch_local_irq_restore+0x14/0x90
  [ 4657.314896] [c0000000013d3810] [c00000000012ed08] vprintk_emit+0x3b8/0x660 (u
  [ 4657.314908] [c0000000013d38e0] [c000000000a02650] printk+0x84/0x98
  [ 4657.314918] [c0000000013d3910] [c0000000000b51b4] __do_softirq+0x1e4/0x410
  [ 4657.314927] [c0000000013d3a00] [c0000000000b57b8] irq_exit+0xf8/0x1400
  [ 4657.314948] [c0000000013d3a60] [c000000000002c14] doorbell_super_common+0x114/0x180
  [ 4657.314963] --- Exception: a01 at plpar_hcall_norets+0x8c/0xdc
  [ 4657.314963]     LR = check_and_cede_processor+0x34/0x5020/0x50 (unreliable)
  [ 4657.314997] [c0000000013d3df0] [c00000000084077c] cpuidle_enter_state+0x6c/0x140c0
  [ 4657.315030] [c0000000013d3f00] [c000000000d63ea8] start_kernel+0x500/0x51c
  [ 4657.315047] Instruction dump:
  [ 4657.315052] eba1ffe8 7c0803a6 ebc1fff0 ebe1fff8 4e800020 3c4c011f 3842c110 78290464
  [ 4657.315068] 81290014 752a000f 7d380026 55291ffe <0b090000> 4bfffec8 60000000
  60000000
  [ 4657.315090] ---[ end trace ee202cccd2211e5d ]---
  [ 4657.320224]
  [ 4657.362675] Unable to handle kernel paging request for data at address 0xc000
  000b35515048
  [ 4657.362680] Faulting instruction address: 0xc00000000006a37c
  [ 4657.362684] Oops: Kernel access of bad area, sig: 11 [#2]
  [ 4657.362686] SMP NR_CPUS=2048 NUMA pSeries
  [ 4657.362695] CPU: 12 PID: 7 Comm: rcu_sched Tainted: G      D       3.16.0-25-
  [ 4657.362699] task: c0000000eb581540 ti: c0000000eb604000 task.ti: c0000000eb60
  [ 4657.362703] NIP: c00000000006a37c LR: c0000000000865a8 CTR: c00000000006a340
  [ 4657.362706] REGS: c0000000eb607800 TRAP: 0300   Tainted: G      D        (3.16.0-25-generic)
  00000000
  [ 4657.362718] CFAR: c0000000000865a4 DAR: c000000b35515048 DSISR: 40000000 SOFTE: 0
  GPR00: c0000000000865a8 c0000000eb607a80 c0000000013d50f0 00000000013d30d0
  GPR08: 0000000000cc0000 c000000b35515000 c00000000e9a0000 0000000000000000
  GPR12: c00000000006a340 c00000000e9a6c00 0000000000000000 0000000000000001
  GPR20: 0000000000000000 c000000001389700 0000000000000000 0000000000000001
  GPR28: c000000001420a68 0000000000000000 00000000013d30d0 0000000000000001
  [ 4657.362758] NIP [c00000000006a37c] icp_hv_cause_ipi+0x3c/0xc0
  [ 4657.362762] LR [c0000000000865a8] pSeries_cause_ipi_mux+0x88/0xc0
  [ 4657.362765] Call Trace:
  0 (unreliable)
  [ 4657.362774] [c0000000eb607af0] [c0000000000865a8] pSeries_cause_ipi_mux+0x88/0xc0
  [ 4657.362778] [c0000000eb607b20] [c0000000000426f0] smp_muxed_ipi_message_pass+
  0x70/0x90
  [ 4657.362783] [c0000000eb607b60] [c0000000000f3a58] resched_task+0x118/0x140
  [ 4657.362786] [c0000000eb607b90] [c0000000000f3da0] resched_cpu+0xc0/0x110
  [ 4657.362791] [c0000000eb607be0] [c00000000013f170] rcu_implicit_dynticks_qs+0x200/0x230
  [ 4657.362795] [c0000000eb607c10] [c00000000013de1c] force_qs_rnp+0x14c/0x250
  [ 4657.362799] [c0000000eb607c90] [c0000000001407f0] rcu_gp_kthread+0x430/0x8e0
  [ 4657.362803] [c0000000eb607d80] [c0000000000e0820] kthread+0x110/0x130
  [ 4657.362807] [c0000000eb607e30] [c00000000000a468] ret_from_kernel_thread+0x5c/0x74
  [ 4657.362810] Instruction dump:
  [ 4657.362812] fbc1fff0 fbe1fff8 f8010010 f821ff91 7c7e1b78 60000000 60000000 3d220008
  [ 4657.362818] 39493f00 1d3e0900 e94a0000 7d2a4a14 <abe90048> 7c0004ac 3860006c
  7fe4fb78
  [ 4657.362825] ---[ end trace ee202cccd2211e5e ]---
  [ 4657.365085]
  [ 4659.320264] Kernel panic - not syncing: Attempted to kill the idle task!
  [ 4659.325500] ---[ end Kernel panic - not syncing: Attempted to kill the idle task!

  Backported following 4 commits/patches from upstream[1]:

          1. commit d658972
          Author: Himangi Saraogi <himangi774@xxxxxxxxx>
          Date:   Tue Jul 22 23:40:19 2014 +0530

              powerpc/perf/hv-24x7: Use kmem_cache_free

          2. commit 48bee8a
          Author: Cody P Schafer <dev@xxxxxxxxxx>
          Date:   Tue Sep 30 23:03:17 2014 -0700

                powerpc/perf/hv-24x7: use kmem_cache instead of aligned
  stack allocations

          3. https://lkml.org/lkml/2014/12/10/613
          4. https://lkml.org/lkml/2014/12/10/36

  to the vivid kernel[2]. The problem does not repro.

  Will Canonical cherry-pick those commits or should we backport ?
  (they apply without conflicts).

  [1] The patches 3 and 4 above were posted recently, Powerpc
        maintainer plans to merge them.

  [2] git://kernel.ubuntu.com/ubuntu/ubuntu-vivid.git

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1410519/+subscriptions