← Back to team overview

kernel-packages team mailing list archive

[Bug 1483343] [NEW] NMI watchdog: BUG: soft lockup errors when we execute lock_torture_wr tests

 

You have been subscribed to a public bug:

---Problem Description---
NMI watchdog: BUG: soft lockup errors when we execute lock_torture_wr tests
  
---uname output---
Linux alp15 3.19.0-18-generic #18~14.04.1-Ubuntu SMP Wed May 20 09:40:36 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux
 
Machine Type = P8 
  
---Steps to Reproduce---
Install a P8 Power VM LPAR with Ubuntu 14.04.2 ISO.
Then install the Ubuntu 14.04.3 kernel on the same and reboot.
Then compile and build the LTP latest test suites on the same.

root@alp15:~# tar -xvf ltp-full-20150420.tar.bz2
root@alp15:~# cd ltp-full-20150420/
root@alp15:~/ltp-full-20150420# ls
aclocal.m4      configure     execltp.in  install-sh  Makefile          README                runltplite.sh    testcases    utils
autom4te.cache  configure.ac  IDcheck.sh  lib         Makefile.release  README.kernel_config  runtest          testscripts  ver_linux
config.guess    COPYING       include     ltpmenu     missing           runalltests.sh        scenario_groups  TODO         VERSION
config.sub      doc           INSTALL     m4          pan               runltp                scripts          tools
root@alp15:~/ltp-full-20150420# ./configure
root@alp15:~/ltp-full-20150420# make
root@alp15:~/ltp-full-20150420# make install

root@alp15:/opt/ltp/testcases/bin# ./lock_torture.sh
lock_torture 1 TINFO : estimate time 6.00 min
lock_torture 1 TINFO : spin_lock: running 60 sec...

Message from syslogd@alp15 at Thu Jun 18 01:23:32 2015 ...
alp15 vmunix: [  308.034386] NMI watchdog: BUG: soft lockup - CPU#10 stuck for 21s! [lock_torture_wr:2337]

Message from syslogd@alp15 at Thu Jun 18 01:23:32 2015 ...
alp15 vmunix: [  308.034389] NMI watchdog: BUG: soft lockup - CPU#6 stuck for 22s! [lock_torture_wr:2331]

Message from syslogd@alp15 at Thu Jun 18 01:23:32 2015 ...
alp15 vmunix: [  308.034394] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [lock_torture_wr:2339]

Message from syslogd@alp15 at Thu Jun 18 01:23:32 2015 ...
alp15 vmunix: [  308.034396] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [lock_torture_wr:2346]

Message from syslogd@alp15 at Thu Jun 18 01:23:32 2015 ...
alp15 vmunix: [  308.034398] NMI watchdog: BUG: soft lockup - CPU#7 stuck for 21s! [lock_torture_wr:2334]

Message from syslogd@alp15 at Thu Jun 18 01:23:32 2015 ...
alp15 vmunix: [  308.034410] NMI watchdog: BUG: soft lockup - CPU#11 stuck for 22s! [lock_torture_wr:2321]

Message from syslogd@alp15 at Thu Jun 18 01:23:32 2015 ...
alp15 vmunix: [  308.034412] NMI watchdog: BUG: soft lockup - CPU#9 stuck for 22s! [lock_torture_wr:2333]

Message from syslogd@alp15 at Thu Jun 18 01:23:32 2015 ...
alp15 vmunix: [  308.038386] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 22s! [lock_torture_wr:2327]

 
Stack trace output:
 root@alp15:~# dmesg | more
[ 1717.146881] lock_torture_wr R  running task
[ 1717.146881]
[ 1717.146885]     0  2555      2 0x00000804
[ 1717.146887] Call Trace:
[ 1717.146894] [c000000c7551b820] [c000000c7551b860] 0xc000000c7551b860 (unreliable)
[ 1717.146899] [c000000c7551b860] [c0000000000b4fb0] __do_softirq+0x220/0x3b0
[ 1717.146904] [c000000c7551b960] [c0000000000b5478] irq_exit+0x98/0x100
[ 1717.146909] [c000000c7551b980] [c00000000001fa54] timer_interrupt+0xa4/0xe0
[ 1717.146913] [c000000c7551b9b0] [c000000000002758] decrementer_common+0x158/0x180
[ 1717.146922] --- interrupt: 901 at _raw_write_lock+0x68/0xc0
[ 1717.146922]     LR = torture_rwlock_write_lock+0x28/0x40 [locktorture]
[ 1717.146927] [c000000c7551bca0] [c000000c7551bcd0] 0xc000000c7551bcd0 (unreliable)
[ 1717.146934] [c000000c7551bcd0] [d00000000d4810b8] torture_rwlock_write_lock+0x28/0x40 [locktorture]
[ 1717.146939] [c000000c7551bcf0] [d00000000d480578] lock_torture_writer+0x98/0x210 [locktorture]
[ 1717.146944] [c000000c7551bd80] [c0000000000da4d4] kthread+0x114/0x140
[ 1717.146948] [c000000c7551be30] [c00000000000956c] ret_from_kernel_thread+0x5c/0x70
[ 1717.146951] Task dump for CPU 10:
[ 1717.146953] lock_torture_wr R  running task        0  2537      2 0x00000804
[ 1717.146957] Call Trace:
[ 1717.146961] [c000000c7557b820] [c000000c7557b860] 0xc000000c7557b860 (unreliable)
[ 1717.146966] [c000000c7557b860] [c0000000000b4fb0] __do_softirq+0x220/0x3b0
[ 1717.146970] [c000000c7557b960] [c0000000000b5478] irq_exit+0x98/0x100
[ 1717.146975] [c000000c7557b980] [c00000000001fa54] timer_interrupt+0xa4/0xe0
[ 1717.146979] [c000000c7557b9b0] [c000000000002758] decrementer_common+0x158/0x180
[ 1717.146988] --- interrupt: 901 at _raw_write_lock+0x68/0xc0
[ 1717.146988]     LR = torture_rwlock_write_lock+0x28/0x40 [locktorture]
[ 1717.146993] [c000000c7557bca0] [c000000c7557bcd0] 0xc000000c7557bcd0 (unreliable)
[ 1717.147000] [c000000c7557bcd0] [d00000000d4810b8] torture_rwlock_write_lock+0x28/0x40 [locktorture]
[ 1717.147006] [c000000c7557bcf0] [d00000000d480578] lock_torture_writer+0x98/0x210 [locktorture]
[ 1717.147013] [c000000c7557bd80] [c0000000000da4d4] kthread+0x114/0x140
[ 1717.147017] [c000000c7557be30] [c00000000000956c] ret_from_kernel_thread+0x5c/0x70
[ 1717.147020] Task dump for CPU 17:
[ 1717.147021] Task dump for CPU 2:
[ 1717.147028] lock_torture_wr R
[ 1717.147028] lock_torture_wr R  running task
[ 1717.147033]   running task        0  2547      2 0x00000804
[ 1717.147042]     0  2533      2 0x00000804
[ 1717.147044] Call Trace:
[ 1717.147045] Call Trace:
[ 1717.147053] [c000000c732a3820] [c000000c7f688448] 0xc000000c7f688448
[ 1717.147056] [c000000c7555f820] [c000000c7fa48448] 0xc000000c7fa48448
[ 1717.147059]  (unreliable)
[ 1717.147063]  (unreliable)
[ 1717.147063]
[ 1717.147067]
[ 1717.147072] Task dump for CPU 18:
[ 1717.147073] Task dump for CPU 7:
[ 1717.147077] lock_torture_wr R  running task
[ 1717.147082] lock_torture_wr R    0  2555      2 0x00000804
[ 1717.147088]   running task
[ 1717.147088] Call Trace:
[ 1717.147096] [c000000c7551b820] [c000000c7551b860] 0xc000000c7551b860
[ 1717.147096]     0  2559      2 0x00000804
[ 1717.147102] Call Trace:
[ 1717.147105]  (unreliable)

It is possible that we are missing this commit that fixes a deadlock
during these tests:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit?id=f548d99ef4f5ec8f7080e88ad07c44d16d058ddc

will check the Ubuntu source shortly as see if this is the case and we
can suggest building a kernel to see if it helps.

The apt-get source linux-image- on the test system didn't pull down the
sources but the kernel being used is close to the one used for vivid
(3.19.0-25.26) so I pulled down the git source tree for it with git
clone git://kernel.ubuntu.com/ubuntu/ubuntu-vivid.git and the resulting
source shows that the patch for the commit mentioned is not applied.

As I basically understand it, the problem that was fixed is that while
torture_rwlock_read_lock_irq() acquires a read lock on the lock called:

torture_rwlock

anything that calls the counterpart torture_rwlock_read_unlock_irq() to
relinquish the read lock instead ends doing a write_unlock_irqrestore()
on the torture_rwlock() in essence leaving the read lock. So when the
locktorture module calls something like torture_rwlock_write_lock() as
we see in the bug description, it will block indefinitely as there is at
least one lock reader.

I'll go ahead and mirror this since I pretty confident this is the issue
(also should affect Vivid).

We'll have to figure out how to get the sources for the LTS kernel to
build a test kernel as well.

** Affects: linux (Ubuntu)
     Importance: Undecided
     Assignee: Taco Screen team (taco-screen-team)
         Status: New


** Tags: architecture-ppc64le bugnameltc-126476 severity-critical targetmilestone-inin14043
-- 
NMI watchdog: BUG: soft lockup errors when we execute lock_torture_wr tests
https://bugs.launchpad.net/bugs/1483343
You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu.