← Back to team overview

kernel-packages team mailing list archive

[Bug 1410519] Re: [PowerVM] Kernel BUG @ kernel/irq_work.c:157! - 24x7 hw counters

 

** Tags added: verification-done-utopic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1410519

Title:
  [PowerVM] Kernel BUG @ kernel/irq_work.c:157!  - 24x7 hw counters

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Utopic:
  Fix Released

Bug description:
  [Impact]
  Using perf with hv_24x7 events can cause a kernel BUG.

  [Fix]
  The following upstream commits:
   d658972
   48bee8a
   f34b6c7
   ec2aef5

  [Test Case]
  Steps to recreate the problem:

  1.  Install Ubuntu 15.04 as a PowerVM guest.
  2.  Install perf tool
  3.  Run following scripts to test 24/7 Power8 hardware counter event with perf. tool

  ===  Script 1
  #!/bin/bash

  count=0;

  offset=0x128
  PERF_ARGS="-r 10 -C 0"
  while [ $count -lt 100 ]; do

          EVENT="hv_24x7/domain=0x2,offset=$offset,starting_index=10/"

          perf stat $PERF_ARGS -x ' ' perf stat $PERF_ARGS -x ' ' -e
  $EVENT ls

          count=)
  done

  ==== Script 2
  #!/bin/bash

  offset=0;

  PERF_ARGS="-r 10 -C 0"
  while [ $offset -lt 8192 ]; do

          EVENT="hv_24x7/domain=0x2,offset=$offset,starting_index=10/"

          perf stat $PERF_ARGS -x ' ' perf stat $PERF_ARGS -x ' ' -e
  $EVENT ls

          offset=)
  done

  After few iterations I hit the following BUG.

  tt2.sh  tt.sh
  tt2.sh  tt.sh
  tt2.sh  tt.sh
  275679187521558  hv_24x7/domain=0x2,offset=6848,starting_index=10/ 0.00%
  tt2.sh  tt.sh
  [ 4657.314709] softirq: huh, entered softirq 7 SCHED c00000000010abc0 with preem
  pt_count 00000100, exited with bfff0000?
  [ 4657.314727] kernel BUG at /build/buildd/linux-3.16.0/kernel/irq_work.c:157!
  [ 4657.314732] Oops: Exception in kernel mode, sig: 5 [#1]
  [ 4657.314740] Modules linked in: rtc_generic pseries_rng
  [ 4657.314749] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-25-generic #33-U
  [ 4657.314755] task: c000000001375e00 ti: c0000000013d0000 task.ti: c0000000013d0000
  [ 4657.314759] NIP: c0000000001e8ffc LR: c00000000001fe70 CTR: c000000000002800ic)
  [ 4657.314770] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28042024  XER: 0000000a
  [ 4657.314782] CFAR: c00000000001fe6c SOFTE: 0
  GPR04: 0000000000000010 00000000009c0000 c000000001424a98 0000000000000002
  GPR12: 8000000000009033 c00000000e9a0000 0000000006a3fcd0 0000000000000060
  GPR16: 0000000000200000 0000000000000000 c000000000e57c00 0000000000000000
  GPR20: c000000001595dca c000000001595478 0000000000000001 000000000000ffff
  GPR28: c000000000e40380 c000000000e40300 c0000000013d3590 c000000000e56f08
  [ 4657.314832] NIP [c0000000001e8ffc] irq_work_run+0x1c/0x30
  [ 4657.314841] Call Trace:
  4000 (unreliable)
  [ 4657.314861] [c0000000013d34f0] [c00000000001ff90] timer_interrupt+0xa0/0xe0
  [ 4657.314871] [c0000000013d3520] [c000000000002914] decrementer_common+0x114/0x180
  [ 4657.314884] --- Exception: 901 at arch_local_irq_restore+0x14/0x90
  [ 4657.314896] [c0000000013d3810] [c00000000012ed08] vprintk_emit+0x3b8/0x660 (u
  [ 4657.314908] [c0000000013d38e0] [c000000000a02650] printk+0x84/0x98
  [ 4657.314918] [c0000000013d3910] [c0000000000b51b4] __do_softirq+0x1e4/0x410
  [ 4657.314927] [c0000000013d3a00] [c0000000000b57b8] irq_exit+0xf8/0x1400
  [ 4657.314948] [c0000000013d3a60] [c000000000002c14] doorbell_super_common+0x114/0x180
  [ 4657.314963] --- Exception: a01 at plpar_hcall_norets+0x8c/0xdc
  [ 4657.314963]     LR = check_and_cede_processor+0x34/0x5020/0x50 (unreliable)
  [ 4657.314997] [c0000000013d3df0] [c00000000084077c] cpuidle_enter_state+0x6c/0x140c0
  [ 4657.315030] [c0000000013d3f00] [c000000000d63ea8] start_kernel+0x500/0x51c
  [ 4657.315047] Instruction dump:
  [ 4657.315052] eba1ffe8 7c0803a6 ebc1fff0 ebe1fff8 4e800020 3c4c011f 3842c110 78290464
  [ 4657.315068] 81290014 752a000f 7d380026 55291ffe <0b090000> 4bfffec8 60000000
  60000000
  [ 4657.315090] ---[ end trace ee202cccd2211e5d ]---
  [ 4657.320224]
  [ 4657.362675] Unable to handle kernel paging request for data at address 0xc000
  000b35515048
  [ 4657.362680] Faulting instruction address: 0xc00000000006a37c
  [ 4657.362684] Oops: Kernel access of bad area, sig: 11 [#2]
  [ 4657.362686] SMP NR_CPUS=2048 NUMA pSeries
  [ 4657.362695] CPU: 12 PID: 7 Comm: rcu_sched Tainted: G      D       3.16.0-25-
  [ 4657.362699] task: c0000000eb581540 ti: c0000000eb604000 task.ti: c0000000eb60
  [ 4657.362703] NIP: c00000000006a37c LR: c0000000000865a8 CTR: c00000000006a340
  [ 4657.362706] REGS: c0000000eb607800 TRAP: 0300   Tainted: G      D        (3.16.0-25-generic)
  00000000
  [ 4657.362718] CFAR: c0000000000865a4 DAR: c000000b35515048 DSISR: 40000000 SOFTE: 0
  GPR00: c0000000000865a8 c0000000eb607a80 c0000000013d50f0 00000000013d30d0
  GPR08: 0000000000cc0000 c000000b35515000 c00000000e9a0000 0000000000000000
  GPR12: c00000000006a340 c00000000e9a6c00 0000000000000000 0000000000000001
  GPR20: 0000000000000000 c000000001389700 0000000000000000 0000000000000001
  GPR28: c000000001420a68 0000000000000000 00000000013d30d0 0000000000000001
  [ 4657.362758] NIP [c00000000006a37c] icp_hv_cause_ipi+0x3c/0xc0
  [ 4657.362762] LR [c0000000000865a8] pSeries_cause_ipi_mux+0x88/0xc0
  [ 4657.362765] Call Trace:
  0 (unreliable)
  [ 4657.362774] [c0000000eb607af0] [c0000000000865a8] pSeries_cause_ipi_mux+0x88/0xc0
  [ 4657.362778] [c0000000eb607b20] [c0000000000426f0] smp_muxed_ipi_message_pass+
  0x70/0x90
  [ 4657.362783] [c0000000eb607b60] [c0000000000f3a58] resched_task+0x118/0x140
  [ 4657.362786] [c0000000eb607b90] [c0000000000f3da0] resched_cpu+0xc0/0x110
  [ 4657.362791] [c0000000eb607be0] [c00000000013f170] rcu_implicit_dynticks_qs+0x200/0x230
  [ 4657.362795] [c0000000eb607c10] [c00000000013de1c] force_qs_rnp+0x14c/0x250
  [ 4657.362799] [c0000000eb607c90] [c0000000001407f0] rcu_gp_kthread+0x430/0x8e0
  [ 4657.362803] [c0000000eb607d80] [c0000000000e0820] kthread+0x110/0x130
  [ 4657.362807] [c0000000eb607e30] [c00000000000a468] ret_from_kernel_thread+0x5c/0x74
  [ 4657.362810] Instruction dump:
  [ 4657.362812] fbc1fff0 fbe1fff8 f8010010 f821ff91 7c7e1b78 60000000 60000000 3d220008
  [ 4657.362818] 39493f00 1d3e0900 e94a0000 7d2a4a14 <abe90048> 7c0004ac 3860006c
  7fe4fb78
  [ 4657.362825] ---[ end trace ee202cccd2211e5e ]---
  [ 4657.365085]
  [ 4659.320264] Kernel panic - not syncing: Attempted to kill the idle task!
  [ 4659.325500] ---[ end Kernel panic - not syncing: Attempted to kill the idle task!

  Backported following 4 commits/patches from upstream[1]:

          1. commit d658972
          Author: Himangi Saraogi <himangi774@xxxxxxxxx>
          Date:   Tue Jul 22 23:40:19 2014 +0530

              powerpc/perf/hv-24x7: Use kmem_cache_free

          2. commit 48bee8a
          Author: Cody P Schafer <dev@xxxxxxxxxx>
          Date:   Tue Sep 30 23:03:17 2014 -0700

                powerpc/perf/hv-24x7: use kmem_cache instead of aligned
  stack allocations

          3. https://lkml.org/lkml/2014/12/10/613
          4. https://lkml.org/lkml/2014/12/10/36

  to the vivid kernel[2]. The problem does not repro.

  Will Canonical cherry-pick those commits or should we backport ?
  (they apply without conflicts).

  [1] The patches 3 and 4 above were posted recently, Powerpc
        maintainer plans to merge them.

  [2] git://kernel.ubuntu.com/ubuntu/ubuntu-vivid.git

  ===
  break-fix: - ec2aef5a8d3c14272f7a2d29b34f1f8e71f2be5b
  break-fix: - f34b6c72c3ebaa286d3311a825ef79eccbcca82f
  break-fix: - 48bee8a6c98e34367fa9d5e1be14109c92cbbb3b
  break-fix: - d6589722846a57a4ddf7af595a7f854ff5180950

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1410519/+subscriptions