← Back to team overview

kernel-packages team mailing list archive

[Bug 1370421] [NEW] BUG: soft lockup - CPU#15 stuck for 59737s! [genload:22734]

 

You have been subscribed to a public bug:

== Comment: #0 - ABDUL HALEEM <abdhalee@xxxxxxxxxx> - 2014-09-01 05:24:37 ==
---Problem Description---
CPU stalls and soft lockup on cpu while running ltpstresstest.sh test of LTP suite, detailed syslog and the test logs are attached

Contact Information = abdhalee@xxxxxxxxxx

---uname output---
Linux ubuntu 3.16.0-10-generic #15-Ubuntu SMP Thu Aug 21 16:32:31 UTC 2014 ppc64le ppc64le ppc64le GNU/Linux
 
Machine Type = POWER8 
 
---Debugger---
A debugger is not configured
 
---Steps to Reproduce---
- Ubuntu 14.10 LE guest running on Power 8 machine with Power KVM build 2_1_1.8
- Download and build LTP suite on the guest. run /opt/ltp/testscripts/ltpstress.sh -d /tmp/sardata -l /tmp/ltplog.12028 -m 128 -t 24 -S 
- After 2hrs of test run, dmesg start throwing below trace messages.

syslog:
---------
Aug 31 09:31:59 ubuntu kernel: [83796.274731] Adding 576k swap on swapfile29.  Priority:-29 extents:1 across:576k FS
Aug 31 09:32:00 ubuntu in.rshd[8457]: connect from 127.0.0.1 (127.0.0.1)
Aug 31 09:32:01 ubuntu in.rshd[8459]: connect from 127.0.0.1 (127.0.0.1)
Aug 31 09:32:02 ubuntu in.rshd[8461]: connect from 127.0.0.1 (127.0.0.1)
Sep  1 04:42:36 ubuntu kernel: [147953.248523] INFO: rcu_sched detected stalls on CPUs/tasks: { 15} (detected by 2, t=92214 jiffies, g=440674, c=440673, q=304)
Sep  1 04:42:36 ubuntu kernel: [147953.248720] Task dump for CPU 15:
Sep  1 04:42:36 ubuntu kernel: [147953.248725] genload         R  running task        0 22734  22733 0x00040000
Sep  1 04:42:36 ubuntu kernel: [147953.248730] Call Trace:
Sep  1 04:42:36 ubuntu kernel: [147953.248740] [c0000000033239b0] [c000000000056fe4] ht64_call_hpte_insert1+0x4/0x3c (unreliable)
Sep  1 04:42:36 ubuntu kernel: [147953.248745] [c000000003323ab0] [c0000000000532c8] hash_preload+0x2f8/0x300
Sep  1 04:42:36 ubuntu kernel: [147953.248748] [c000000003323b30] [c00000000004eaf0] update_mmu_cache+0xf0/0x110
Sep  1 04:42:36 ubuntu kernel: [147953.248753] [c000000003323b70] [c00000000023559c] handle_mm_fault+0xa0c/0x11b0
Sep  1 04:42:36 ubuntu kernel: [147953.248758] [c000000003323c10] [c0000000009e58dc] do_page_fault+0x71c/0x990
Sep  1 04:42:36 ubuntu kernel: [147953.248762] [c000000003323e30] [c000000000009568] handle_page_fault+0x10/0x30
Sep  1 04:42:36 ubuntu kernel: [147953.250365] INFO: rcu_sched detected stalls on CPUs/tasks: { 15} (detected by 2, t=16035133 jiffies, g=440674, c=440673, q=304)
Sep  1 04:42:36 ubuntu kernel: [147953.250519] Task dump for CPU 15:
Sep  1 04:42:36 ubuntu kernel: [147953.250522] genload         R  running task        0 22734  22733 0x00040000
Sep  1 04:42:36 ubuntu kernel: [147953.250525] Call Trace:
Sep  1 04:42:36 ubuntu kernel: [147953.250528] [c0000000033239b0] [c000000000056fe4] ht64_call_hpte_insert1+0x4/0x3c (unreliable)
Sep  1 04:42:36 ubuntu kernel: [147953.250532] [c000000003323ab0] [c0000000000532c8] hash_preload+0x2f8/0x300
Sep  1 04:42:36 ubuntu kernel: [147953.250535] [c000000003323b30] [c00000000004eaf0] update_mmu_cache+0xf0/0x110
Sep  1 04:42:36 ubuntu kernel: [147953.250538] [c000000003323b70] [c00000000023559c] handle_mm_fault+0xa0c/0x11b0
Sep  1 04:42:36 ubuntu kernel: [147953.250541] [c000000003323c10] [c0000000009e58dc] do_page_fault+0x71c/0x990
Sep  1 04:42:36 ubuntu kernel: [147953.250544] [c000000003323e30] [c000000000009568] handle_page_fault+0x10/0x30
Sep  1 04:42:36 ubuntu kernel: [147953.257562] BUG: soft lockup - CPU#15 stuck for 59737s! [genload:22734]
Sep  1 04:42:36 ubuntu kernel: [147953.257647] Modules linked in: nfsv2 nfsv3 nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache pseries_rng rtc_generic e1000 ohci_pci

Other details :
------------------
@ubuntu:/tmp$ lscpu
Architecture:          ppc64le
Byte Order:            Little Endian
CPU(s):                16
On-line CPU(s) list:   0-15
Thread(s) per core:    1
Core(s) per socket:    1
Socket(s):             16
NUMA node(s):          1
Model:                 IBM pSeries (emulated by qemu)
L1d cache:             64K
L1i cache:             32K
NUMA node0 CPU(s):     0-15

@ubuntu:/tmp$ free
             total       used       free     shared    buffers     cached
Mem:       2072704     892480    1180224        448     274240     132480
-/+ buffers/cache:     485760    1586944
Swap:      3460160      35392    3424768

@ubuntu:/tmp$ uptime 
 05:22:02 up 1 day, 19:06,  2 users,  load average: 10.67, 9.10, 9.32


Thanks

== Comment: #1 - ABDUL HALEEM <abdhalee@xxxxxxxxxx> - 2014-09-01
05:31:58 ==


== Comment: #2 - ABDUL HALEEM <abdhalee@xxxxxxxxxx> - 2014-09-01 05:36:48 ==


== Comment: #5 - MAMATHA INAMDAR <mainamdar@xxxxxxxxxx> - 2014-09-05 05:03:56 ==
Hi Abdul,
Are you able to recreate this issue?
Please update  the bug with your latest test results.

== Comment: #6 - ABDUL HALEEM <abdhalee@xxxxxxxxxx> - 2014-09-10 05:55:47 ==
(In reply to comment #5)
> Hi Abdul,
> Are you able to recreate this issue?
> Please update  the bug with your latest test results.

Hi Mamatha,

I have started the test again with xmon enabled.

will keep updating you on status.

Thanks

== Comment: #7 - ABDUL HALEEM <abdhalee@xxxxxxxxxx> - 2014-09-10 05:59:17 ==
I have started the test on 3.16.0-14-generic and I still see these messages in syslog 

[ 8075.169576] Unable to find swap-space signature
[ 7452.105450] Unable to find swap-space signature

should we worry about this.

the original problem has not reproduced yet..will update the soon

== Comment: #8 - Dan Streetman <ddstreet@xxxxxxxxxx> - 2014-09-10 08:44:21 ==
(In reply to comment #7)
> I have started the test on 3.16.0-14-generic and I still see these messages
> in syslog 
> 
> [ 8075.169576] Unable to find swap-space signature
> [ 7452.105450] Unable to find swap-space signature
> 
> should we worry about this.

It looks like you have some kind of tests creating/adding swap files,
and I have no idea what those tests look like, so I don't know if this
is an expected result of the tests or not.  Generally that error means
you are trying to swapon a swap file that isn't correctly initialized
with mkswap, or it's header is corrupted.

Assuming your test isn't expecting a failure, you should just mkswap
again on whatever swap file is failing.  It looks like "./swapfile01",
but since you're using relative paths, I can't tell you where it's
located.

== Comment: #9 - ABDUL HALEEM <abdhalee@xxxxxxxxxx> - 2014-09-11 04:09:13 ==
Hi,

I recreated the bug on latest kernel 3.16.0-14-generic

If I properly recall the scenario due to which kernel triggered soft
lockup - CPU#15  traces is

During my first test run, the next day I saw the guest was in 'paused'
state, as my host disk partition on which  /var/lib/libvirt/images  is
mounted was out of space, I freed up the disk space and resumed the
guest. Still i see my test were running, but dmesg showed the traces
messages.

So in my last run I recreated similar scenario with xmon=on and found
that the traces are triggered when I suspend and resume my guest when
test were running and not because of my actual test.

--- Actual steps to reproduce --
- enable  xmon in /etc/default/grub   and run 'update-grub' and 'reboot'
- Run ltpstress test
- suspend the guest 'virsh suspend <guest>'
- after few seconds resume.  my test running fine 
- dmesg showed the original traces messages as below

perhaps when the traces were triggered, the console did not fall to
xmon, I guess this might be a different problem.

I have kept the system in the same state.

Trace messages:
[84735.190787] Adding 576k swap on swapfile27.  Priority:-27 extents:1 across:576k FS
[84735.740298] Adding 576k swap on swapfile28.  Priority:-28 extents:1 across:576k FS
[84736.062528] Adding 576k swap on swapfile29.  Priority:-29 extents:1 across:576k FS
[84924.032436] BUG: soft lockup - CPU#0 stuck for 104s! [float_bessel:10251]
[84924.032507] Modules linked in: nfsv2 nfsv3 nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache pseries_rng rtc_generic shpchp ohci_pci e1000
[84924.032525] CPU: 0 PID: 10251 Comm: float_bessel Not tainted 3.16.0-14-generic #20-Ubuntu
[84924.032527] task: c000000003100000 ti: c00000003250c000 task.ti: c00000003250c000
[84924.032529] NIP: c0000000000110b4 LR: c0000000000110b4 CTR: 00003fffb4644120
[84924.032531] REGS: c00000003250fb90 TRAP: 0901   Not tainted  (3.16.0-14-generic)
[84924.032532] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 22002444  XER: 00000000
[84924.032538] CFAR: 00003fffb4645888 SOFTE: 1 
GPR00: c00000000000a704 c00000003250fe10 c0000000013d49e0 0000000000000900 
GPR04: 0000000000040004 0000000000000000 00000000009c0000 00000000ff001009 
GPR08: 000182dee8f4d56f 000000007fefffff 0000000040cc8595 0000000000000000 
GPR12: 0000000000002200 00003fffab8658f0 
[84924.032552] NIP [c0000000000110b4] arch_local_irq_restore+0x74/0x90
[84924.032554] LR [c0000000000110b4] arch_local_irq_restore+0x74/0x90
[84924.032556] Call Trace:
[84924.032557] [c00000003250fe10] [0000000000002856] 0x2856 (unreliable)
[84924.032561] [c00000003250fe30] [c00000000000a704] ret_from_except_lite+0x30/0x60
[84924.032562] Instruction dump:
[84924.032563] 994d02ba 2fa30000 409e0024 e92d0020 61298000 7d210164 38210020 e8010010 
[84924.032566] 7c0803a6 4e800020 60420000 4bff1315 <60000000> 4bffffe4 60420000 e92d0020 
[84926.062119] Adding 576k swap on ./swapfile01.  Priority:-2 extents:1 across:576k FS
[84936.733247] Adding 65472k swap on ./swapfile01.  Priority:-2 extents:2 across:114624k


Thanks

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: architecture-ppc64le bot-comment bugnameltc-115436 severity-medium targetmilestone-inin---
-- 
BUG: soft lockup - CPU#15 stuck for 59737s! [genload:22734]
https://bugs.launchpad.net/bugs/1370421
You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu.