kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #99468
[Bug 1410519] Re: [PowerVM] Kernel BUG @ kernel/irq_work.c:157! - 24x7 hw counters
Thanks Suka,
Chris,
Our major concern is regarding 15.04 for this bug, so, if you see any
problem with this patch for 14.10, we can skip it and fix it in 15.04
only.
Thank you,
Breno
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1410519
Title:
[PowerVM] Kernel BUG @ kernel/irq_work.c:157! - 24x7 hw counters
Status in linux package in Ubuntu:
Triaged
Status in linux source package in Utopic:
In Progress
Bug description:
[Impact]
Using perf with hv_24x7 events can cause a kernel BUG.
[Fix]
The following upstream commits:
d658972
48bee8a
f34b6c7
ec2aef5
[Test Case]
Steps to recreate the problem:
1. Install Ubuntu 15.04 as a PowerVM guest.
2. Install perf tool
3. Run following scripts to test 24/7 Power8 hardware counter event with perf. tool
=== Script 1
#!/bin/bash
count=0;
offset=0x128
PERF_ARGS="-r 10 -C 0"
while [ $count -lt 100 ]; do
EVENT="hv_24x7/domain=0x2,offset=$offset,starting_index=10/"
perf stat $PERF_ARGS -x ' ' perf stat $PERF_ARGS -x ' ' -e
$EVENT ls
count=)
done
==== Script 2
#!/bin/bash
offset=0;
PERF_ARGS="-r 10 -C 0"
while [ $offset -lt 8192 ]; do
EVENT="hv_24x7/domain=0x2,offset=$offset,starting_index=10/"
perf stat $PERF_ARGS -x ' ' perf stat $PERF_ARGS -x ' ' -e
$EVENT ls
offset=)
done
After few iterations I hit the following BUG.
tt2.sh tt.sh
tt2.sh tt.sh
tt2.sh tt.sh
275679187521558 hv_24x7/domain=0x2,offset=6848,starting_index=10/ 0.00%
tt2.sh tt.sh
[ 4657.314709] softirq: huh, entered softirq 7 SCHED c00000000010abc0 with preem
pt_count 00000100, exited with bfff0000?
[ 4657.314727] kernel BUG at /build/buildd/linux-3.16.0/kernel/irq_work.c:157!
[ 4657.314732] Oops: Exception in kernel mode, sig: 5 [#1]
[ 4657.314740] Modules linked in: rtc_generic pseries_rng
[ 4657.314749] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-25-generic #33-U
[ 4657.314755] task: c000000001375e00 ti: c0000000013d0000 task.ti: c0000000013d0000
[ 4657.314759] NIP: c0000000001e8ffc LR: c00000000001fe70 CTR: c000000000002800ic)
[ 4657.314770] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 28042024 XER: 0000000a
[ 4657.314782] CFAR: c00000000001fe6c SOFTE: 0
GPR04: 0000000000000010 00000000009c0000 c000000001424a98 0000000000000002
GPR12: 8000000000009033 c00000000e9a0000 0000000006a3fcd0 0000000000000060
GPR16: 0000000000200000 0000000000000000 c000000000e57c00 0000000000000000
GPR20: c000000001595dca c000000001595478 0000000000000001 000000000000ffff
GPR28: c000000000e40380 c000000000e40300 c0000000013d3590 c000000000e56f08
[ 4657.314832] NIP [c0000000001e8ffc] irq_work_run+0x1c/0x30
[ 4657.314841] Call Trace:
4000 (unreliable)
[ 4657.314861] [c0000000013d34f0] [c00000000001ff90] timer_interrupt+0xa0/0xe0
[ 4657.314871] [c0000000013d3520] [c000000000002914] decrementer_common+0x114/0x180
[ 4657.314884] --- Exception: 901 at arch_local_irq_restore+0x14/0x90
[ 4657.314896] [c0000000013d3810] [c00000000012ed08] vprintk_emit+0x3b8/0x660 (u
[ 4657.314908] [c0000000013d38e0] [c000000000a02650] printk+0x84/0x98
[ 4657.314918] [c0000000013d3910] [c0000000000b51b4] __do_softirq+0x1e4/0x410
[ 4657.314927] [c0000000013d3a00] [c0000000000b57b8] irq_exit+0xf8/0x1400
[ 4657.314948] [c0000000013d3a60] [c000000000002c14] doorbell_super_common+0x114/0x180
[ 4657.314963] --- Exception: a01 at plpar_hcall_norets+0x8c/0xdc
[ 4657.314963] LR = check_and_cede_processor+0x34/0x5020/0x50 (unreliable)
[ 4657.314997] [c0000000013d3df0] [c00000000084077c] cpuidle_enter_state+0x6c/0x140c0
[ 4657.315030] [c0000000013d3f00] [c000000000d63ea8] start_kernel+0x500/0x51c
[ 4657.315047] Instruction dump:
[ 4657.315052] eba1ffe8 7c0803a6 ebc1fff0 ebe1fff8 4e800020 3c4c011f 3842c110 78290464
[ 4657.315068] 81290014 752a000f 7d380026 55291ffe <0b090000> 4bfffec8 60000000
60000000
[ 4657.315090] ---[ end trace ee202cccd2211e5d ]---
[ 4657.320224]
[ 4657.362675] Unable to handle kernel paging request for data at address 0xc000
000b35515048
[ 4657.362680] Faulting instruction address: 0xc00000000006a37c
[ 4657.362684] Oops: Kernel access of bad area, sig: 11 [#2]
[ 4657.362686] SMP NR_CPUS=2048 NUMA pSeries
[ 4657.362695] CPU: 12 PID: 7 Comm: rcu_sched Tainted: G D 3.16.0-25-
[ 4657.362699] task: c0000000eb581540 ti: c0000000eb604000 task.ti: c0000000eb60
[ 4657.362703] NIP: c00000000006a37c LR: c0000000000865a8 CTR: c00000000006a340
[ 4657.362706] REGS: c0000000eb607800 TRAP: 0300 Tainted: G D (3.16.0-25-generic)
00000000
[ 4657.362718] CFAR: c0000000000865a4 DAR: c000000b35515048 DSISR: 40000000 SOFTE: 0
GPR00: c0000000000865a8 c0000000eb607a80 c0000000013d50f0 00000000013d30d0
GPR08: 0000000000cc0000 c000000b35515000 c00000000e9a0000 0000000000000000
GPR12: c00000000006a340 c00000000e9a6c00 0000000000000000 0000000000000001
GPR20: 0000000000000000 c000000001389700 0000000000000000 0000000000000001
GPR28: c000000001420a68 0000000000000000 00000000013d30d0 0000000000000001
[ 4657.362758] NIP [c00000000006a37c] icp_hv_cause_ipi+0x3c/0xc0
[ 4657.362762] LR [c0000000000865a8] pSeries_cause_ipi_mux+0x88/0xc0
[ 4657.362765] Call Trace:
0 (unreliable)
[ 4657.362774] [c0000000eb607af0] [c0000000000865a8] pSeries_cause_ipi_mux+0x88/0xc0
[ 4657.362778] [c0000000eb607b20] [c0000000000426f0] smp_muxed_ipi_message_pass+
0x70/0x90
[ 4657.362783] [c0000000eb607b60] [c0000000000f3a58] resched_task+0x118/0x140
[ 4657.362786] [c0000000eb607b90] [c0000000000f3da0] resched_cpu+0xc0/0x110
[ 4657.362791] [c0000000eb607be0] [c00000000013f170] rcu_implicit_dynticks_qs+0x200/0x230
[ 4657.362795] [c0000000eb607c10] [c00000000013de1c] force_qs_rnp+0x14c/0x250
[ 4657.362799] [c0000000eb607c90] [c0000000001407f0] rcu_gp_kthread+0x430/0x8e0
[ 4657.362803] [c0000000eb607d80] [c0000000000e0820] kthread+0x110/0x130
[ 4657.362807] [c0000000eb607e30] [c00000000000a468] ret_from_kernel_thread+0x5c/0x74
[ 4657.362810] Instruction dump:
[ 4657.362812] fbc1fff0 fbe1fff8 f8010010 f821ff91 7c7e1b78 60000000 60000000 3d220008
[ 4657.362818] 39493f00 1d3e0900 e94a0000 7d2a4a14 <abe90048> 7c0004ac 3860006c
7fe4fb78
[ 4657.362825] ---[ end trace ee202cccd2211e5e ]---
[ 4657.365085]
[ 4659.320264] Kernel panic - not syncing: Attempted to kill the idle task!
[ 4659.325500] ---[ end Kernel panic - not syncing: Attempted to kill the idle task!
Backported following 4 commits/patches from upstream[1]:
1. commit d658972
Author: Himangi Saraogi <himangi774@xxxxxxxxx>
Date: Tue Jul 22 23:40:19 2014 +0530
powerpc/perf/hv-24x7: Use kmem_cache_free
2. commit 48bee8a
Author: Cody P Schafer <dev@xxxxxxxxxx>
Date: Tue Sep 30 23:03:17 2014 -0700
powerpc/perf/hv-24x7: use kmem_cache instead of aligned
stack allocations
3. https://lkml.org/lkml/2014/12/10/613
4. https://lkml.org/lkml/2014/12/10/36
to the vivid kernel[2]. The problem does not repro.
Will Canonical cherry-pick those commits or should we backport ?
(they apply without conflicts).
[1] The patches 3 and 4 above were posted recently, Powerpc
maintainer plans to merge them.
[2] git://kernel.ubuntu.com/ubuntu/ubuntu-vivid.git
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1410519/+subscriptions