kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #85430
[Bug 1370421] Re: BUG: soft lockup - CPU#15 stuck for 59737s! [genload:22734]
** Package changed: ubuntu => linux (Ubuntu)
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1370421
Title:
BUG: soft lockup - CPU#15 stuck for 59737s! [genload:22734]
Status in “linux” package in Ubuntu:
Incomplete
Bug description:
== Comment: #0 - ABDUL HALEEM <abdhalee@xxxxxxxxxx> - 2014-09-01 05:24:37 ==
---Problem Description---
CPU stalls and soft lockup on cpu while running ltpstresstest.sh test of LTP suite, detailed syslog and the test logs are attached
Contact Information = abdhalee@xxxxxxxxxx
---uname output---
Linux ubuntu 3.16.0-10-generic #15-Ubuntu SMP Thu Aug 21 16:32:31 UTC 2014 ppc64le ppc64le ppc64le GNU/Linux
Machine Type = POWER8
---Debugger---
A debugger is not configured
---Steps to Reproduce---
- Ubuntu 14.10 LE guest running on Power 8 machine with Power KVM build 2_1_1.8
- Download and build LTP suite on the guest. run /opt/ltp/testscripts/ltpstress.sh -d /tmp/sardata -l /tmp/ltplog.12028 -m 128 -t 24 -S
- After 2hrs of test run, dmesg start throwing below trace messages.
syslog:
---------
Aug 31 09:31:59 ubuntu kernel: [83796.274731] Adding 576k swap on swapfile29. Priority:-29 extents:1 across:576k FS
Aug 31 09:32:00 ubuntu in.rshd[8457]: connect from 127.0.0.1 (127.0.0.1)
Aug 31 09:32:01 ubuntu in.rshd[8459]: connect from 127.0.0.1 (127.0.0.1)
Aug 31 09:32:02 ubuntu in.rshd[8461]: connect from 127.0.0.1 (127.0.0.1)
Sep 1 04:42:36 ubuntu kernel: [147953.248523] INFO: rcu_sched detected stalls on CPUs/tasks: { 15} (detected by 2, t=92214 jiffies, g=440674, c=440673, q=304)
Sep 1 04:42:36 ubuntu kernel: [147953.248720] Task dump for CPU 15:
Sep 1 04:42:36 ubuntu kernel: [147953.248725] genload R running task 0 22734 22733 0x00040000
Sep 1 04:42:36 ubuntu kernel: [147953.248730] Call Trace:
Sep 1 04:42:36 ubuntu kernel: [147953.248740] [c0000000033239b0] [c000000000056fe4] ht64_call_hpte_insert1+0x4/0x3c (unreliable)
Sep 1 04:42:36 ubuntu kernel: [147953.248745] [c000000003323ab0] [c0000000000532c8] hash_preload+0x2f8/0x300
Sep 1 04:42:36 ubuntu kernel: [147953.248748] [c000000003323b30] [c00000000004eaf0] update_mmu_cache+0xf0/0x110
Sep 1 04:42:36 ubuntu kernel: [147953.248753] [c000000003323b70] [c00000000023559c] handle_mm_fault+0xa0c/0x11b0
Sep 1 04:42:36 ubuntu kernel: [147953.248758] [c000000003323c10] [c0000000009e58dc] do_page_fault+0x71c/0x990
Sep 1 04:42:36 ubuntu kernel: [147953.248762] [c000000003323e30] [c000000000009568] handle_page_fault+0x10/0x30
Sep 1 04:42:36 ubuntu kernel: [147953.250365] INFO: rcu_sched detected stalls on CPUs/tasks: { 15} (detected by 2, t=16035133 jiffies, g=440674, c=440673, q=304)
Sep 1 04:42:36 ubuntu kernel: [147953.250519] Task dump for CPU 15:
Sep 1 04:42:36 ubuntu kernel: [147953.250522] genload R running task 0 22734 22733 0x00040000
Sep 1 04:42:36 ubuntu kernel: [147953.250525] Call Trace:
Sep 1 04:42:36 ubuntu kernel: [147953.250528] [c0000000033239b0] [c000000000056fe4] ht64_call_hpte_insert1+0x4/0x3c (unreliable)
Sep 1 04:42:36 ubuntu kernel: [147953.250532] [c000000003323ab0] [c0000000000532c8] hash_preload+0x2f8/0x300
Sep 1 04:42:36 ubuntu kernel: [147953.250535] [c000000003323b30] [c00000000004eaf0] update_mmu_cache+0xf0/0x110
Sep 1 04:42:36 ubuntu kernel: [147953.250538] [c000000003323b70] [c00000000023559c] handle_mm_fault+0xa0c/0x11b0
Sep 1 04:42:36 ubuntu kernel: [147953.250541] [c000000003323c10] [c0000000009e58dc] do_page_fault+0x71c/0x990
Sep 1 04:42:36 ubuntu kernel: [147953.250544] [c000000003323e30] [c000000000009568] handle_page_fault+0x10/0x30
Sep 1 04:42:36 ubuntu kernel: [147953.257562] BUG: soft lockup - CPU#15 stuck for 59737s! [genload:22734]
Sep 1 04:42:36 ubuntu kernel: [147953.257647] Modules linked in: nfsv2 nfsv3 nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache pseries_rng rtc_generic e1000 ohci_pci
Other details :
------------------
@ubuntu:/tmp$ lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 16
NUMA node(s): 1
Model: IBM pSeries (emulated by qemu)
L1d cache: 64K
L1i cache: 32K
NUMA node0 CPU(s): 0-15
@ubuntu:/tmp$ free
total used free shared buffers cached
Mem: 2072704 892480 1180224 448 274240 132480
-/+ buffers/cache: 485760 1586944
Swap: 3460160 35392 3424768
@ubuntu:/tmp$ uptime
05:22:02 up 1 day, 19:06, 2 users, load average: 10.67, 9.10, 9.32
Thanks
== Comment: #1 - ABDUL HALEEM <abdhalee@xxxxxxxxxx> - 2014-09-01
05:31:58 ==
== Comment: #2 - ABDUL HALEEM <abdhalee@xxxxxxxxxx> - 2014-09-01 05:36:48 ==
== Comment: #5 - MAMATHA INAMDAR <mainamdar@xxxxxxxxxx> - 2014-09-05 05:03:56 ==
Hi Abdul,
Are you able to recreate this issue?
Please update the bug with your latest test results.
== Comment: #6 - ABDUL HALEEM <abdhalee@xxxxxxxxxx> - 2014-09-10 05:55:47 ==
(In reply to comment #5)
> Hi Abdul,
> Are you able to recreate this issue?
> Please update the bug with your latest test results.
Hi Mamatha,
I have started the test again with xmon enabled.
will keep updating you on status.
Thanks
== Comment: #7 - ABDUL HALEEM <abdhalee@xxxxxxxxxx> - 2014-09-10 05:59:17 ==
I have started the test on 3.16.0-14-generic and I still see these messages in syslog
[ 8075.169576] Unable to find swap-space signature
[ 7452.105450] Unable to find swap-space signature
should we worry about this.
the original problem has not reproduced yet..will update the soon
== Comment: #8 - Dan Streetman <ddstreet@xxxxxxxxxx> - 2014-09-10 08:44:21 ==
(In reply to comment #7)
> I have started the test on 3.16.0-14-generic and I still see these messages
> in syslog
>
> [ 8075.169576] Unable to find swap-space signature
> [ 7452.105450] Unable to find swap-space signature
>
> should we worry about this.
It looks like you have some kind of tests creating/adding swap files,
and I have no idea what those tests look like, so I don't know if this
is an expected result of the tests or not. Generally that error means
you are trying to swapon a swap file that isn't correctly initialized
with mkswap, or it's header is corrupted.
Assuming your test isn't expecting a failure, you should just mkswap
again on whatever swap file is failing. It looks like "./swapfile01",
but since you're using relative paths, I can't tell you where it's
located.
== Comment: #9 - ABDUL HALEEM <abdhalee@xxxxxxxxxx> - 2014-09-11 04:09:13 ==
Hi,
I recreated the bug on latest kernel 3.16.0-14-generic
If I properly recall the scenario due to which kernel triggered soft
lockup - CPU#15 traces is
During my first test run, the next day I saw the guest was in 'paused'
state, as my host disk partition on which /var/lib/libvirt/images is
mounted was out of space, I freed up the disk space and resumed the
guest. Still i see my test were running, but dmesg showed the traces
messages.
So in my last run I recreated similar scenario with xmon=on and found
that the traces are triggered when I suspend and resume my guest when
test were running and not because of my actual test.
--- Actual steps to reproduce --
- enable xmon in /etc/default/grub and run 'update-grub' and 'reboot'
- Run ltpstress test
- suspend the guest 'virsh suspend <guest>'
- after few seconds resume. my test running fine
- dmesg showed the original traces messages as below
perhaps when the traces were triggered, the console did not fall to
xmon, I guess this might be a different problem.
I have kept the system in the same state.
Trace messages:
[84735.190787] Adding 576k swap on swapfile27. Priority:-27 extents:1 across:576k FS
[84735.740298] Adding 576k swap on swapfile28. Priority:-28 extents:1 across:576k FS
[84736.062528] Adding 576k swap on swapfile29. Priority:-29 extents:1 across:576k FS
[84924.032436] BUG: soft lockup - CPU#0 stuck for 104s! [float_bessel:10251]
[84924.032507] Modules linked in: nfsv2 nfsv3 nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache pseries_rng rtc_generic shpchp ohci_pci e1000
[84924.032525] CPU: 0 PID: 10251 Comm: float_bessel Not tainted 3.16.0-14-generic #20-Ubuntu
[84924.032527] task: c000000003100000 ti: c00000003250c000 task.ti: c00000003250c000
[84924.032529] NIP: c0000000000110b4 LR: c0000000000110b4 CTR: 00003fffb4644120
[84924.032531] REGS: c00000003250fb90 TRAP: 0901 Not tainted (3.16.0-14-generic)
[84924.032532] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 22002444 XER: 00000000
[84924.032538] CFAR: 00003fffb4645888 SOFTE: 1
GPR00: c00000000000a704 c00000003250fe10 c0000000013d49e0 0000000000000900
GPR04: 0000000000040004 0000000000000000 00000000009c0000 00000000ff001009
GPR08: 000182dee8f4d56f 000000007fefffff 0000000040cc8595 0000000000000000
GPR12: 0000000000002200 00003fffab8658f0
[84924.032552] NIP [c0000000000110b4] arch_local_irq_restore+0x74/0x90
[84924.032554] LR [c0000000000110b4] arch_local_irq_restore+0x74/0x90
[84924.032556] Call Trace:
[84924.032557] [c00000003250fe10] [0000000000002856] 0x2856 (unreliable)
[84924.032561] [c00000003250fe30] [c00000000000a704] ret_from_except_lite+0x30/0x60
[84924.032562] Instruction dump:
[84924.032563] 994d02ba 2fa30000 409e0024 e92d0020 61298000 7d210164 38210020 e8010010
[84924.032566] 7c0803a6 4e800020 60420000 4bff1315 <60000000> 4bffffe4 60420000 e92d0020
[84926.062119] Adding 576k swap on ./swapfile01. Priority:-2 extents:1 across:576k FS
[84936.733247] Adding 65472k swap on ./swapfile01. Priority:-2 extents:2 across:114624k
Thanks
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1370421/+subscriptions