kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #104113
[Bug 1410519] Re: [PowerVM] Kernel BUG @ kernel/irq_work.c:157! - 24x7 hw counters
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
utopic' to 'verification-done-utopic'.
If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.
See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!
** Tags added: verification-needed-utopic
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1410519
Title:
[PowerVM] Kernel BUG @ kernel/irq_work.c:157! - 24x7 hw counters
Status in linux package in Ubuntu:
Confirmed
Status in linux source package in Utopic:
Fix Committed
Bug description:
[Impact]
Using perf with hv_24x7 events can cause a kernel BUG.
[Fix]
The following upstream commits:
d658972
48bee8a
f34b6c7
ec2aef5
[Test Case]
Steps to recreate the problem:
1. Install Ubuntu 15.04 as a PowerVM guest.
2. Install perf tool
3. Run following scripts to test 24/7 Power8 hardware counter event with perf. tool
=== Script 1
#!/bin/bash
count=0;
offset=0x128
PERF_ARGS="-r 10 -C 0"
while [ $count -lt 100 ]; do
EVENT="hv_24x7/domain=0x2,offset=$offset,starting_index=10/"
perf stat $PERF_ARGS -x ' ' perf stat $PERF_ARGS -x ' ' -e
$EVENT ls
count=)
done
==== Script 2
#!/bin/bash
offset=0;
PERF_ARGS="-r 10 -C 0"
while [ $offset -lt 8192 ]; do
EVENT="hv_24x7/domain=0x2,offset=$offset,starting_index=10/"
perf stat $PERF_ARGS -x ' ' perf stat $PERF_ARGS -x ' ' -e
$EVENT ls
offset=)
done
After few iterations I hit the following BUG.
tt2.sh tt.sh
tt2.sh tt.sh
tt2.sh tt.sh
275679187521558 hv_24x7/domain=0x2,offset=6848,starting_index=10/ 0.00%
tt2.sh tt.sh
[ 4657.314709] softirq: huh, entered softirq 7 SCHED c00000000010abc0 with preem
pt_count 00000100, exited with bfff0000?
[ 4657.314727] kernel BUG at /build/buildd/linux-3.16.0/kernel/irq_work.c:157!
[ 4657.314732] Oops: Exception in kernel mode, sig: 5 [#1]
[ 4657.314740] Modules linked in: rtc_generic pseries_rng
[ 4657.314749] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-25-generic #33-U
[ 4657.314755] task: c000000001375e00 ti: c0000000013d0000 task.ti: c0000000013d0000
[ 4657.314759] NIP: c0000000001e8ffc LR: c00000000001fe70 CTR: c000000000002800ic)
[ 4657.314770] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 28042024 XER: 0000000a
[ 4657.314782] CFAR: c00000000001fe6c SOFTE: 0
GPR04: 0000000000000010 00000000009c0000 c000000001424a98 0000000000000002
GPR12: 8000000000009033 c00000000e9a0000 0000000006a3fcd0 0000000000000060
GPR16: 0000000000200000 0000000000000000 c000000000e57c00 0000000000000000
GPR20: c000000001595dca c000000001595478 0000000000000001 000000000000ffff
GPR28: c000000000e40380 c000000000e40300 c0000000013d3590 c000000000e56f08
[ 4657.314832] NIP [c0000000001e8ffc] irq_work_run+0x1c/0x30
[ 4657.314841] Call Trace:
4000 (unreliable)
[ 4657.314861] [c0000000013d34f0] [c00000000001ff90] timer_interrupt+0xa0/0xe0
[ 4657.314871] [c0000000013d3520] [c000000000002914] decrementer_common+0x114/0x180
[ 4657.314884] --- Exception: 901 at arch_local_irq_restore+0x14/0x90
[ 4657.314896] [c0000000013d3810] [c00000000012ed08] vprintk_emit+0x3b8/0x660 (u
[ 4657.314908] [c0000000013d38e0] [c000000000a02650] printk+0x84/0x98
[ 4657.314918] [c0000000013d3910] [c0000000000b51b4] __do_softirq+0x1e4/0x410
[ 4657.314927] [c0000000013d3a00] [c0000000000b57b8] irq_exit+0xf8/0x1400
[ 4657.314948] [c0000000013d3a60] [c000000000002c14] doorbell_super_common+0x114/0x180
[ 4657.314963] --- Exception: a01 at plpar_hcall_norets+0x8c/0xdc
[ 4657.314963] LR = check_and_cede_processor+0x34/0x5020/0x50 (unreliable)
[ 4657.314997] [c0000000013d3df0] [c00000000084077c] cpuidle_enter_state+0x6c/0x140c0
[ 4657.315030] [c0000000013d3f00] [c000000000d63ea8] start_kernel+0x500/0x51c
[ 4657.315047] Instruction dump:
[ 4657.315052] eba1ffe8 7c0803a6 ebc1fff0 ebe1fff8 4e800020 3c4c011f 3842c110 78290464
[ 4657.315068] 81290014 752a000f 7d380026 55291ffe <0b090000> 4bfffec8 60000000
60000000
[ 4657.315090] ---[ end trace ee202cccd2211e5d ]---
[ 4657.320224]
[ 4657.362675] Unable to handle kernel paging request for data at address 0xc000
000b35515048
[ 4657.362680] Faulting instruction address: 0xc00000000006a37c
[ 4657.362684] Oops: Kernel access of bad area, sig: 11 [#2]
[ 4657.362686] SMP NR_CPUS=2048 NUMA pSeries
[ 4657.362695] CPU: 12 PID: 7 Comm: rcu_sched Tainted: G D 3.16.0-25-
[ 4657.362699] task: c0000000eb581540 ti: c0000000eb604000 task.ti: c0000000eb60
[ 4657.362703] NIP: c00000000006a37c LR: c0000000000865a8 CTR: c00000000006a340
[ 4657.362706] REGS: c0000000eb607800 TRAP: 0300 Tainted: G D (3.16.0-25-generic)
00000000
[ 4657.362718] CFAR: c0000000000865a4 DAR: c000000b35515048 DSISR: 40000000 SOFTE: 0
GPR00: c0000000000865a8 c0000000eb607a80 c0000000013d50f0 00000000013d30d0
GPR08: 0000000000cc0000 c000000b35515000 c00000000e9a0000 0000000000000000
GPR12: c00000000006a340 c00000000e9a6c00 0000000000000000 0000000000000001
GPR20: 0000000000000000 c000000001389700 0000000000000000 0000000000000001
GPR28: c000000001420a68 0000000000000000 00000000013d30d0 0000000000000001
[ 4657.362758] NIP [c00000000006a37c] icp_hv_cause_ipi+0x3c/0xc0
[ 4657.362762] LR [c0000000000865a8] pSeries_cause_ipi_mux+0x88/0xc0
[ 4657.362765] Call Trace:
0 (unreliable)
[ 4657.362774] [c0000000eb607af0] [c0000000000865a8] pSeries_cause_ipi_mux+0x88/0xc0
[ 4657.362778] [c0000000eb607b20] [c0000000000426f0] smp_muxed_ipi_message_pass+
0x70/0x90
[ 4657.362783] [c0000000eb607b60] [c0000000000f3a58] resched_task+0x118/0x140
[ 4657.362786] [c0000000eb607b90] [c0000000000f3da0] resched_cpu+0xc0/0x110
[ 4657.362791] [c0000000eb607be0] [c00000000013f170] rcu_implicit_dynticks_qs+0x200/0x230
[ 4657.362795] [c0000000eb607c10] [c00000000013de1c] force_qs_rnp+0x14c/0x250
[ 4657.362799] [c0000000eb607c90] [c0000000001407f0] rcu_gp_kthread+0x430/0x8e0
[ 4657.362803] [c0000000eb607d80] [c0000000000e0820] kthread+0x110/0x130
[ 4657.362807] [c0000000eb607e30] [c00000000000a468] ret_from_kernel_thread+0x5c/0x74
[ 4657.362810] Instruction dump:
[ 4657.362812] fbc1fff0 fbe1fff8 f8010010 f821ff91 7c7e1b78 60000000 60000000 3d220008
[ 4657.362818] 39493f00 1d3e0900 e94a0000 7d2a4a14 <abe90048> 7c0004ac 3860006c
7fe4fb78
[ 4657.362825] ---[ end trace ee202cccd2211e5e ]---
[ 4657.365085]
[ 4659.320264] Kernel panic - not syncing: Attempted to kill the idle task!
[ 4659.325500] ---[ end Kernel panic - not syncing: Attempted to kill the idle task!
Backported following 4 commits/patches from upstream[1]:
1. commit d658972
Author: Himangi Saraogi <himangi774@xxxxxxxxx>
Date: Tue Jul 22 23:40:19 2014 +0530
powerpc/perf/hv-24x7: Use kmem_cache_free
2. commit 48bee8a
Author: Cody P Schafer <dev@xxxxxxxxxx>
Date: Tue Sep 30 23:03:17 2014 -0700
powerpc/perf/hv-24x7: use kmem_cache instead of aligned
stack allocations
3. https://lkml.org/lkml/2014/12/10/613
4. https://lkml.org/lkml/2014/12/10/36
to the vivid kernel[2]. The problem does not repro.
Will Canonical cherry-pick those commits or should we backport ?
(they apply without conflicts).
[1] The patches 3 and 4 above were posted recently, Powerpc
maintainer plans to merge them.
[2] git://kernel.ubuntu.com/ubuntu/ubuntu-vivid.git
===
break-fix: - ec2aef5a8d3c14272f7a2d29b34f1f8e71f2be5b
break-fix: - f34b6c72c3ebaa286d3311a825ef79eccbcca82f
break-fix: - 48bee8a6c98e34367fa9d5e1be14109c92cbbb3b
break-fix: - d6589722846a57a4ddf7af595a7f854ff5180950
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1410519/+subscriptions