← Back to team overview

kernel-packages team mailing list archive

[Bug 1410519] [NEW] [PowerVM] Kernel BUG @ kernel/irq_work.c:157! - 24x7 hw counters

 

You have been subscribed to a public bug:

Steps to recreate the problem:

1.  Install Ubuntu 15.04 as a PowerVM guest.
2.  Install perf tool
3.  Run following scripts to test 24/7 Power8 hardware counter event with perf. tool

===  Script 1
#!/bin/bash

count=0;

offset=0x128
PERF_ARGS="-r 10 -C 0"
while [ $count -lt 100 ]; do

        EVENT="hv_24x7/domain=0x2,offset=$offset,starting_index=10/"

        perf stat $PERF_ARGS -x ' ' perf stat $PERF_ARGS -x ' ' -e
$EVENT ls

        count=)
done

==== Script 2
#!/bin/bash

offset=0;

PERF_ARGS="-r 10 -C 0"
while [ $offset -lt 8192 ]; do

        EVENT="hv_24x7/domain=0x2,offset=$offset,starting_index=10/"

        perf stat $PERF_ARGS -x ' ' perf stat $PERF_ARGS -x ' ' -e
$EVENT ls

        offset=)
done

After few iterations I hit the following BUG.

tt2.sh  tt.sh                                                                   
tt2.sh  tt.sh                                                                   
tt2.sh  tt.sh                                                                   
275679187521558  hv_24x7/domain=0x2,offset=6848,starting_index=10/ 0.00%        
tt2.sh  tt.sh                                                                   
[ 4657.314709] softirq: huh, entered softirq 7 SCHED c00000000010abc0 with preem
pt_count 00000100, exited with bfff0000?                                        
[ 4657.314727] kernel BUG at /build/buildd/linux-3.16.0/kernel/irq_work.c:157!  
[ 4657.314732] Oops: Exception in kernel mode, sig: 5 [#1]                      
[ 4657.314740] Modules linked in: rtc_generic pseries_rng                       
[ 4657.314749] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-25-generic #33-U
[ 4657.314755] task: c000000001375e00 ti: c0000000013d0000 task.ti: c0000000013d0000
[ 4657.314759] NIP: c0000000001e8ffc LR: c00000000001fe70 CTR: c000000000002800ic)
[ 4657.314770] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28042024  XER: 0000000a
[ 4657.314782] CFAR: c00000000001fe6c SOFTE: 0                                  
GPR04: 0000000000000010 00000000009c0000 c000000001424a98 0000000000000002      
GPR12: 8000000000009033 c00000000e9a0000 0000000006a3fcd0 0000000000000060      
GPR16: 0000000000200000 0000000000000000 c000000000e57c00 0000000000000000      
GPR20: c000000001595dca c000000001595478 0000000000000001 000000000000ffff      
GPR28: c000000000e40380 c000000000e40300 c0000000013d3590 c000000000e56f08      
[ 4657.314832] NIP [c0000000001e8ffc] irq_work_run+0x1c/0x30                    
[ 4657.314841] Call Trace:                                                      
4000 (unreliable)                                                               
[ 4657.314861] [c0000000013d34f0] [c00000000001ff90] timer_interrupt+0xa0/0xe0  
[ 4657.314871] [c0000000013d3520] [c000000000002914] decrementer_common+0x114/0x180
[ 4657.314884] --- Exception: 901 at arch_local_irq_restore+0x14/0x90           
[ 4657.314896] [c0000000013d3810] [c00000000012ed08] vprintk_emit+0x3b8/0x660 (u
[ 4657.314908] [c0000000013d38e0] [c000000000a02650] printk+0x84/0x98           
[ 4657.314918] [c0000000013d3910] [c0000000000b51b4] __do_softirq+0x1e4/0x410   
[ 4657.314927] [c0000000013d3a00] [c0000000000b57b8] irq_exit+0xf8/0x1400
[ 4657.314948] [c0000000013d3a60] [c000000000002c14] doorbell_super_common+0x114/0x180
[ 4657.314963] --- Exception: a01 at plpar_hcall_norets+0x8c/0xdc               
[ 4657.314963]     LR = check_and_cede_processor+0x34/0x5020/0x50 (unreliable)
[ 4657.314997] [c0000000013d3df0] [c00000000084077c] cpuidle_enter_state+0x6c/0x140c0 
[ 4657.315030] [c0000000013d3f00] [c000000000d63ea8] start_kernel+0x500/0x51c   
[ 4657.315047] Instruction dump:                                                
[ 4657.315052] eba1ffe8 7c0803a6 ebc1fff0 ebe1fff8 4e800020 3c4c011f 3842c110 78290464
[ 4657.315068] 81290014 752a000f 7d380026 55291ffe <0b090000> 4bfffec8 60000000 
60000000                                                                        
[ 4657.315090] ---[ end trace ee202cccd2211e5d ]---                             
[ 4657.320224]                                                                  
[ 4657.362675] Unable to handle kernel paging request for data at address 0xc000
000b35515048                                                                    
[ 4657.362680] Faulting instruction address: 0xc00000000006a37c                 
[ 4657.362684] Oops: Kernel access of bad area, sig: 11 [#2]                    
[ 4657.362686] SMP NR_CPUS=2048 NUMA pSeries                                    
[ 4657.362695] CPU: 12 PID: 7 Comm: rcu_sched Tainted: G      D       3.16.0-25-
[ 4657.362699] task: c0000000eb581540 ti: c0000000eb604000 task.ti: c0000000eb60
[ 4657.362703] NIP: c00000000006a37c LR: c0000000000865a8 CTR: c00000000006a340 
[ 4657.362706] REGS: c0000000eb607800 TRAP: 0300   Tainted: G      D        (3.16.0-25-generic)
00000000                                                                        
[ 4657.362718] CFAR: c0000000000865a4 DAR: c000000b35515048 DSISR: 40000000 SOFTE: 0
GPR00: c0000000000865a8 c0000000eb607a80 c0000000013d50f0 00000000013d30d0      
GPR08: 0000000000cc0000 c000000b35515000 c00000000e9a0000 0000000000000000      
GPR12: c00000000006a340 c00000000e9a6c00 0000000000000000 0000000000000001      
GPR20: 0000000000000000 c000000001389700 0000000000000000 0000000000000001      
GPR28: c000000001420a68 0000000000000000 00000000013d30d0 0000000000000001      
[ 4657.362758] NIP [c00000000006a37c] icp_hv_cause_ipi+0x3c/0xc0                
[ 4657.362762] LR [c0000000000865a8] pSeries_cause_ipi_mux+0x88/0xc0            
[ 4657.362765] Call Trace:                                                      
0 (unreliable)                                                                  
[ 4657.362774] [c0000000eb607af0] [c0000000000865a8] pSeries_cause_ipi_mux+0x88/0xc0
[ 4657.362778] [c0000000eb607b20] [c0000000000426f0] smp_muxed_ipi_message_pass+
0x70/0x90
[ 4657.362783] [c0000000eb607b60] [c0000000000f3a58] resched_task+0x118/0x140   
[ 4657.362786] [c0000000eb607b90] [c0000000000f3da0] resched_cpu+0xc0/0x110     
[ 4657.362791] [c0000000eb607be0] [c00000000013f170] rcu_implicit_dynticks_qs+0x200/0x230
[ 4657.362795] [c0000000eb607c10] [c00000000013de1c] force_qs_rnp+0x14c/0x250   
[ 4657.362799] [c0000000eb607c90] [c0000000001407f0] rcu_gp_kthread+0x430/0x8e0 
[ 4657.362803] [c0000000eb607d80] [c0000000000e0820] kthread+0x110/0x130        
[ 4657.362807] [c0000000eb607e30] [c00000000000a468] ret_from_kernel_thread+0x5c/0x74
[ 4657.362810] Instruction dump:                                                
[ 4657.362812] fbc1fff0 fbe1fff8 f8010010 f821ff91 7c7e1b78 60000000 60000000 3d220008
[ 4657.362818] 39493f00 1d3e0900 e94a0000 7d2a4a14 <abe90048> 7c0004ac 3860006c
7fe4fb78
[ 4657.362825] ---[ end trace ee202cccd2211e5e ]---                             
[ 4657.365085]                                                                  
[ 4659.320264] Kernel panic - not syncing: Attempted to kill the idle task!     
[ 4659.325500] ---[ end Kernel panic - not syncing: Attempted to kill the idle task!

Backported following 4 commits/patches from upstream[1]:

        1. commit d658972
        Author: Himangi Saraogi <himangi774@xxxxxxxxx>
        Date:   Tue Jul 22 23:40:19 2014 +0530

            powerpc/perf/hv-24x7: Use kmem_cache_free

        2. commit 48bee8a
        Author: Cody P Schafer <dev@xxxxxxxxxx>
        Date:   Tue Sep 30 23:03:17 2014 -0700
 
              powerpc/perf/hv-24x7: use kmem_cache instead of aligned stack allocations

        3. https://lkml.org/lkml/2014/12/10/613
        4. https://lkml.org/lkml/2014/12/10/36

to the vivid kernel[2]. The problem does not repro.

Will Canonical cherry-pick those commits or should we backport ?
(they apply without conflicts).

[1] The patches 3 and 4 above were posted recently, Powerpc
      maintainer plans to merge them.

[2] git://kernel.ubuntu.com/ubuntu/ubuntu-vivid.git

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: architecture-ppc64le bot-comment bugnameltc-119744 severity-critical targetmilestone-inin1504
-- 
[PowerVM] Kernel BUG @ kernel/irq_work.c:157!  - 24x7 hw counters
https://bugs.launchpad.net/bugs/1410519
You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu.