← Back to team overview

kernel-packages team mailing list archive

[Bug 1370421] Re: BUG: soft lockup - CPU#15 stuck for 59737s! [genload:22734]

 

** Package changed: ubuntu => linux (Ubuntu)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1370421

Title:
  BUG: soft lockup - CPU#15 stuck for 59737s! [genload:22734]

Status in “linux” package in Ubuntu:
  Incomplete

Bug description:
  == Comment: #0 - ABDUL HALEEM <abdhalee@xxxxxxxxxx> - 2014-09-01 05:24:37 ==
  ---Problem Description---
  CPU stalls and soft lockup on cpu while running ltpstresstest.sh test of LTP suite, detailed syslog and the test logs are attached

  Contact Information = abdhalee@xxxxxxxxxx

  ---uname output---
  Linux ubuntu 3.16.0-10-generic #15-Ubuntu SMP Thu Aug 21 16:32:31 UTC 2014 ppc64le ppc64le ppc64le GNU/Linux
   
  Machine Type = POWER8 
   
  ---Debugger---
  A debugger is not configured
   
  ---Steps to Reproduce---
  - Ubuntu 14.10 LE guest running on Power 8 machine with Power KVM build 2_1_1.8
  - Download and build LTP suite on the guest. run /opt/ltp/testscripts/ltpstress.sh -d /tmp/sardata -l /tmp/ltplog.12028 -m 128 -t 24 -S 
  - After 2hrs of test run, dmesg start throwing below trace messages.

  syslog:
  ---------
  Aug 31 09:31:59 ubuntu kernel: [83796.274731] Adding 576k swap on swapfile29.  Priority:-29 extents:1 across:576k FS
  Aug 31 09:32:00 ubuntu in.rshd[8457]: connect from 127.0.0.1 (127.0.0.1)
  Aug 31 09:32:01 ubuntu in.rshd[8459]: connect from 127.0.0.1 (127.0.0.1)
  Aug 31 09:32:02 ubuntu in.rshd[8461]: connect from 127.0.0.1 (127.0.0.1)
  Sep  1 04:42:36 ubuntu kernel: [147953.248523] INFO: rcu_sched detected stalls on CPUs/tasks: { 15} (detected by 2, t=92214 jiffies, g=440674, c=440673, q=304)
  Sep  1 04:42:36 ubuntu kernel: [147953.248720] Task dump for CPU 15:
  Sep  1 04:42:36 ubuntu kernel: [147953.248725] genload         R  running task        0 22734  22733 0x00040000
  Sep  1 04:42:36 ubuntu kernel: [147953.248730] Call Trace:
  Sep  1 04:42:36 ubuntu kernel: [147953.248740] [c0000000033239b0] [c000000000056fe4] ht64_call_hpte_insert1+0x4/0x3c (unreliable)
  Sep  1 04:42:36 ubuntu kernel: [147953.248745] [c000000003323ab0] [c0000000000532c8] hash_preload+0x2f8/0x300
  Sep  1 04:42:36 ubuntu kernel: [147953.248748] [c000000003323b30] [c00000000004eaf0] update_mmu_cache+0xf0/0x110
  Sep  1 04:42:36 ubuntu kernel: [147953.248753] [c000000003323b70] [c00000000023559c] handle_mm_fault+0xa0c/0x11b0
  Sep  1 04:42:36 ubuntu kernel: [147953.248758] [c000000003323c10] [c0000000009e58dc] do_page_fault+0x71c/0x990
  Sep  1 04:42:36 ubuntu kernel: [147953.248762] [c000000003323e30] [c000000000009568] handle_page_fault+0x10/0x30
  Sep  1 04:42:36 ubuntu kernel: [147953.250365] INFO: rcu_sched detected stalls on CPUs/tasks: { 15} (detected by 2, t=16035133 jiffies, g=440674, c=440673, q=304)
  Sep  1 04:42:36 ubuntu kernel: [147953.250519] Task dump for CPU 15:
  Sep  1 04:42:36 ubuntu kernel: [147953.250522] genload         R  running task        0 22734  22733 0x00040000
  Sep  1 04:42:36 ubuntu kernel: [147953.250525] Call Trace:
  Sep  1 04:42:36 ubuntu kernel: [147953.250528] [c0000000033239b0] [c000000000056fe4] ht64_call_hpte_insert1+0x4/0x3c (unreliable)
  Sep  1 04:42:36 ubuntu kernel: [147953.250532] [c000000003323ab0] [c0000000000532c8] hash_preload+0x2f8/0x300
  Sep  1 04:42:36 ubuntu kernel: [147953.250535] [c000000003323b30] [c00000000004eaf0] update_mmu_cache+0xf0/0x110
  Sep  1 04:42:36 ubuntu kernel: [147953.250538] [c000000003323b70] [c00000000023559c] handle_mm_fault+0xa0c/0x11b0
  Sep  1 04:42:36 ubuntu kernel: [147953.250541] [c000000003323c10] [c0000000009e58dc] do_page_fault+0x71c/0x990
  Sep  1 04:42:36 ubuntu kernel: [147953.250544] [c000000003323e30] [c000000000009568] handle_page_fault+0x10/0x30
  Sep  1 04:42:36 ubuntu kernel: [147953.257562] BUG: soft lockup - CPU#15 stuck for 59737s! [genload:22734]
  Sep  1 04:42:36 ubuntu kernel: [147953.257647] Modules linked in: nfsv2 nfsv3 nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache pseries_rng rtc_generic e1000 ohci_pci

  Other details :
  ------------------
  @ubuntu:/tmp$ lscpu
  Architecture:          ppc64le
  Byte Order:            Little Endian
  CPU(s):                16
  On-line CPU(s) list:   0-15
  Thread(s) per core:    1
  Core(s) per socket:    1
  Socket(s):             16
  NUMA node(s):          1
  Model:                 IBM pSeries (emulated by qemu)
  L1d cache:             64K
  L1i cache:             32K
  NUMA node0 CPU(s):     0-15

  @ubuntu:/tmp$ free
               total       used       free     shared    buffers     cached
  Mem:       2072704     892480    1180224        448     274240     132480
  -/+ buffers/cache:     485760    1586944
  Swap:      3460160      35392    3424768

  @ubuntu:/tmp$ uptime 
   05:22:02 up 1 day, 19:06,  2 users,  load average: 10.67, 9.10, 9.32

  
  Thanks

  == Comment: #1 - ABDUL HALEEM <abdhalee@xxxxxxxxxx> - 2014-09-01
  05:31:58 ==

  
  == Comment: #2 - ABDUL HALEEM <abdhalee@xxxxxxxxxx> - 2014-09-01 05:36:48 ==

  
  == Comment: #5 - MAMATHA INAMDAR <mainamdar@xxxxxxxxxx> - 2014-09-05 05:03:56 ==
  Hi Abdul,
  Are you able to recreate this issue?
  Please update  the bug with your latest test results.

  == Comment: #6 - ABDUL HALEEM <abdhalee@xxxxxxxxxx> - 2014-09-10 05:55:47 ==
  (In reply to comment #5)
  > Hi Abdul,
  > Are you able to recreate this issue?
  > Please update  the bug with your latest test results.

  Hi Mamatha,

  I have started the test again with xmon enabled.

  will keep updating you on status.

  Thanks

  == Comment: #7 - ABDUL HALEEM <abdhalee@xxxxxxxxxx> - 2014-09-10 05:59:17 ==
  I have started the test on 3.16.0-14-generic and I still see these messages in syslog 

  [ 8075.169576] Unable to find swap-space signature
  [ 7452.105450] Unable to find swap-space signature

  should we worry about this.

  the original problem has not reproduced yet..will update the soon

  == Comment: #8 - Dan Streetman <ddstreet@xxxxxxxxxx> - 2014-09-10 08:44:21 ==
  (In reply to comment #7)
  > I have started the test on 3.16.0-14-generic and I still see these messages
  > in syslog 
  > 
  > [ 8075.169576] Unable to find swap-space signature
  > [ 7452.105450] Unable to find swap-space signature
  > 
  > should we worry about this.

  It looks like you have some kind of tests creating/adding swap files,
  and I have no idea what those tests look like, so I don't know if this
  is an expected result of the tests or not.  Generally that error means
  you are trying to swapon a swap file that isn't correctly initialized
  with mkswap, or it's header is corrupted.

  Assuming your test isn't expecting a failure, you should just mkswap
  again on whatever swap file is failing.  It looks like "./swapfile01",
  but since you're using relative paths, I can't tell you where it's
  located.

  == Comment: #9 - ABDUL HALEEM <abdhalee@xxxxxxxxxx> - 2014-09-11 04:09:13 ==
  Hi,

  I recreated the bug on latest kernel 3.16.0-14-generic

  If I properly recall the scenario due to which kernel triggered soft
  lockup - CPU#15  traces is

  During my first test run, the next day I saw the guest was in 'paused'
  state, as my host disk partition on which  /var/lib/libvirt/images  is
  mounted was out of space, I freed up the disk space and resumed the
  guest. Still i see my test were running, but dmesg showed the traces
  messages.

  So in my last run I recreated similar scenario with xmon=on and found
  that the traces are triggered when I suspend and resume my guest when
  test were running and not because of my actual test.

  --- Actual steps to reproduce --
  - enable  xmon in /etc/default/grub   and run 'update-grub' and 'reboot'
  - Run ltpstress test
  - suspend the guest 'virsh suspend <guest>'
  - after few seconds resume.  my test running fine 
  - dmesg showed the original traces messages as below

  perhaps when the traces were triggered, the console did not fall to
  xmon, I guess this might be a different problem.

  I have kept the system in the same state.

  Trace messages:
  [84735.190787] Adding 576k swap on swapfile27.  Priority:-27 extents:1 across:576k FS
  [84735.740298] Adding 576k swap on swapfile28.  Priority:-28 extents:1 across:576k FS
  [84736.062528] Adding 576k swap on swapfile29.  Priority:-29 extents:1 across:576k FS
  [84924.032436] BUG: soft lockup - CPU#0 stuck for 104s! [float_bessel:10251]
  [84924.032507] Modules linked in: nfsv2 nfsv3 nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache pseries_rng rtc_generic shpchp ohci_pci e1000
  [84924.032525] CPU: 0 PID: 10251 Comm: float_bessel Not tainted 3.16.0-14-generic #20-Ubuntu
  [84924.032527] task: c000000003100000 ti: c00000003250c000 task.ti: c00000003250c000
  [84924.032529] NIP: c0000000000110b4 LR: c0000000000110b4 CTR: 00003fffb4644120
  [84924.032531] REGS: c00000003250fb90 TRAP: 0901   Not tainted  (3.16.0-14-generic)
  [84924.032532] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 22002444  XER: 00000000
  [84924.032538] CFAR: 00003fffb4645888 SOFTE: 1 
  GPR00: c00000000000a704 c00000003250fe10 c0000000013d49e0 0000000000000900 
  GPR04: 0000000000040004 0000000000000000 00000000009c0000 00000000ff001009 
  GPR08: 000182dee8f4d56f 000000007fefffff 0000000040cc8595 0000000000000000 
  GPR12: 0000000000002200 00003fffab8658f0 
  [84924.032552] NIP [c0000000000110b4] arch_local_irq_restore+0x74/0x90
  [84924.032554] LR [c0000000000110b4] arch_local_irq_restore+0x74/0x90
  [84924.032556] Call Trace:
  [84924.032557] [c00000003250fe10] [0000000000002856] 0x2856 (unreliable)
  [84924.032561] [c00000003250fe30] [c00000000000a704] ret_from_except_lite+0x30/0x60
  [84924.032562] Instruction dump:
  [84924.032563] 994d02ba 2fa30000 409e0024 e92d0020 61298000 7d210164 38210020 e8010010 
  [84924.032566] 7c0803a6 4e800020 60420000 4bff1315 <60000000> 4bffffe4 60420000 e92d0020 
  [84926.062119] Adding 576k swap on ./swapfile01.  Priority:-2 extents:1 across:576k FS
  [84936.733247] Adding 65472k swap on ./swapfile01.  Priority:-2 extents:2 across:114624k

  
  Thanks

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1370421/+subscriptions