kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #150807
[Bug 1483343] Comment bridged from LTC Bugzilla
------- Comment From pavsubra@xxxxxxxxxx 2015-12-11 06:16 EDT-------
I have downloaded the mainline wily kernel as suggested and executed the scenario again.
root@powerkvmpok002:~/mainline# wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.4-rc4-wily/linux-image-4.4.0-040400rc4-generic_4.4.0-040400rc4.201512061930_ppc64el.deb
--2015-12-11 00:28:48-- http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.4-rc4-wily/linux-image-4.4.0-040400rc4-generic_4.4.0-040400rc4.201512061930_ppc64el.deb
Resolving kernel.ubuntu.com (kernel.ubuntu.com)... 91.189.94.216
Connecting to kernel.ubuntu.com (kernel.ubuntu.com)|91.189.94.216|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 56391678 (54M) [application/x-debian-package]
Saving to: ?linux-image-4.4.0-040400rc4-generic_4.4.0-040400rc4.201512061930_ppc64el.deb?
100%[===================================================================================================================>]
56,391,678 846KB/s in 66s
2015-12-11 00:29:54 (832 KB/s) - ?linux-
image-4.4.0-040400rc4-generic_4.4.0-040400rc4.201512061930_ppc64el.deb?
saved [56391678/56391678]
root@powerkvmpok002:~/mainline# dpkg -i linux-image-4.4.0-040400rc4-generic_4.4.0-040400rc4.201512061930_ppc64el.deb
Selecting previously unselected package linux-image-4.4.0-040400rc4-generic.
(Reading database ... 137006 files and directories currently installed.)
Preparing to unpack linux-image-4.4.0-040400rc4-generic_4.4.0-040400rc4.201512061930_ppc64el.deb ...
Done.
Unpacking linux-image-4.4.0-040400rc4-generic (4.4.0-040400rc4.201512061930) ...
Setting up linux-image-4.4.0-040400rc4-generic (4.4.0-040400rc4.201512061930) ...
Running depmod.
update-initramfs: deferring update (hook will be called later)
Examining /etc/kernel/postinst.d.
run-parts: executing /etc/kernel/postinst.d/apt-auto-removal 4.4.0-040400rc4-generic /boot/vmlinux-4.4.0-040400rc4-generic
run-parts: executing /etc/kernel/postinst.d/initramfs-tools 4.4.0-040400rc4-generic /boot/vmlinux-4.4.0-040400rc4-generic
update-initramfs: Generating /boot/initrd.img-4.4.0-040400rc4-generic
W: Possible missing firmware /lib/firmware/ql2700_fw.bin for module qla2xxx
W: Possible missing firmware /lib/firmware/ql8300_fw.bin for module qla2xxx
W: Possible missing firmware /lib/firmware/ql2600_fw.bin for module qla2xxx
run-parts: executing /etc/kernel/postinst.d/update-notifier 4.4.0-040400rc4-generic /boot/vmlinux-4.4.0-040400rc4-generic
run-parts: executing /etc/kernel/postinst.d/zz-update-grub 4.4.0-040400rc4-generic /boot/vmlinux-4.4.0-040400rc4-generic
Generating grub configuration file ...
Found linux image: /boot/vmlinux-4.4.0-040400rc4-generic
Found initrd image: /boot/initrd.img-4.4.0-040400rc4-generic
Found linux image: /boot/vmlinux-3.19.0-39-generic
Found initrd image: /boot/initrd.img-3.19.0-39-generic
Found Debian GNU/Linux (8.2) on /dev/sdb2
Found unknown Linux distribution on /dev/mapper/ibmpkvm_vg_root01-ibmpkvm_lv_system
done
root@powerkvmpok002:~# uname -a
Linux powerkvmpok002 4.4.0-040400rc4-generic #201512061930 SMP Mon Dec 7 01:14:51 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux
root@powerkvmpok002:~# cd /opt/ltp/testcases/bin/
root@powerkvmpok002:/opt/ltp/testcases/bin# ./lock_torture.sh
./lock_torture.sh: 55: ./lock_torture.sh: tst_kvercmp: not found
lock_torture 1 TINFO : estimate time 6.00 min
lock_torture 1 TINFO : spin_lock: running 60 sec...
lock_torture 1 TPASS : spin_lock: completed
lock_torture 2 TINFO : spin_lock_irq: running 60 sec...
lock_torture 2 TPASS : spin_lock_irq: completed
lock_torture 3 TINFO : rw_lock: running 60 sec...
lock_torture 3 TPASS : rw_lock: completed
lock_torture 4 TINFO : rw_lock_irq: running 60 sec...
root@powerkvmpok002:~# [ 614.289969] NMI watchdog: BUG: soft lockup - CPU#150 stuck for 140s! [lock_torture_wr:4325]
[ 619.293972] NMI watchdog: BUG: soft lockup - CPU#23 stuck for 244s! [lock_torture_wr:4399]
[ 619.293987] NMI watchdog: BUG: soft lockup - CPU#98 stuck for 51s! [lock_torture_wr:4387]
[ 627.301973] NMI watchdog: BUG: soft lockup - CPU#39 stuck for 242s! [lock_torture_wr:4403]
[ 669.345931] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 669.346071] 1-...: (2 GPs behind) idle=02f/140000000000000/0 softirq=10477/10480 fqs=2908
[ 669.346162] 8-...: (0 ticks this GP) idle=58f/140000000000000/0 softirq=9878/9878 fqs=2908
[ 669.346290] 11-...: (0 ticks this GP) idle=e0f/140000000000000/0 softirq=10060/10060 fqs=2908
[ 669.346383] 12-...: (0 ticks this GP) idle=3dd/140000000000000/0 softirq=10024/10024 fqs=2908
[ 669.346470] 14-...: (1 GPs behind) idle=141/140000000000000/0 softirq=10118/10118 fqs=2908
[ 669.346560] 19-...: (1 GPs behind) idle=83d/140000000000000/0 softirq=10273/10273 fqs=2908
[ 669.346660] (detected by 104, t=84094 jiffies, g=3980, c=3979, q=7125)
[ 672.213960] NMI watchdog: BUG: soft lockup - CPU#116 stuck for 53s! [lock_torture_wr:4425]
[ 672.213980] NMI watchdog: BUG: soft lockup - CPU#32 stuck for 50s! [lock_torture_wr:4327]
[ 672.214347] rw_lock_irq-torture: Creating lock_torture_writer task
[ 718.977967] NMI watchdog: BUG: soft lockup - CPU#18 stuck for 144s! [lock_torture_wr:4321]
[ 718.977970] NMI watchdog: BUG: soft lockup - CPU#22 stuck for 323s! [lock_torture_wr:4407]
[ 732.409929] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 732.410024] 1-...: (2 GPs behind) idle=02f/140000000000000/0 softirq=10477/10480 fqs=2939
[ 732.410103] 8-...: (0 ticks this GP) idle=58f/140000000000000/0 softirq=9878/9878 fqs=2939
[ 732.410193] 11-...: (0 ticks this GP) idle=e0f/140000000000000/0 softirq=10060/10060 fqs=2939
[ 732.410282] 12-...: (0 ticks this GP) idle=3dd/140000000000000/0 softirq=10024/10024 fqs=2939
[ 732.410370] 14-...: (1 GPs behind) idle=141/140000000000000/0 softirq=10118/10118 fqs=2939
[ 732.410459] 19-...: (1 GPs behind) idle=83d/140000000000000/0 softirq=10273/10273 fqs=2939
[ 732.410546] (detected by 130, t=99860 jiffies, g=3980, c=3979, q=7361)
[ 732.410808] rcu_sched kthread starved for 15047 jiffies! g3980 c3979 f0x0 s3 ->state=0x1
[ 756.157974] NMI watchdog: BUG: soft lockup - CPU#33 stuck for 78s! [lock_torture_wr:4343]
[ 756.159342] rw_lock_irq-torture: lock_torture_writer task started
The problem still occurs with the mainline wily kernel as can be seen
above.
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1483343
Title:
NMI watchdog: BUG: soft lockup errors when we execute lock_torture_wr
tests
Status in linux package in Ubuntu:
Triaged
Status in linux source package in Vivid:
In Progress
Bug description:
---Problem Description---
NMI watchdog: BUG: soft lockup errors when we execute lock_torture_wr tests
---uname output---
Linux alp15 3.19.0-18-generic #18~14.04.1-Ubuntu SMP Wed May 20 09:40:36 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux
Machine Type = P8
---Steps to Reproduce---
Install a P8 Power VM LPAR with Ubuntu 14.04.2 ISO.
Then install the Ubuntu 14.04.3 kernel on the same and reboot.
Then compile and build the LTP latest test suites on the same.
root@alp15:~# tar -xvf ltp-full-20150420.tar.bz2
root@alp15:~# cd ltp-full-20150420/
root@alp15:~/ltp-full-20150420# ls
aclocal.m4 configure execltp.in install-sh Makefile README runltplite.sh testcases utils
autom4te.cache configure.ac IDcheck.sh lib Makefile.release README.kernel_config runtest testscripts ver_linux
config.guess COPYING include ltpmenu missing runalltests.sh scenario_groups TODO VERSION
config.sub doc INSTALL m4 pan runltp scripts tools
root@alp15:~/ltp-full-20150420# ./configure
root@alp15:~/ltp-full-20150420# make
root@alp15:~/ltp-full-20150420# make install
root@alp15:/opt/ltp/testcases/bin# ./lock_torture.sh
lock_torture 1 TINFO : estimate time 6.00 min
lock_torture 1 TINFO : spin_lock: running 60 sec...
Message from syslogd@alp15 at Thu Jun 18 01:23:32 2015 ...
alp15 vmunix: [ 308.034386] NMI watchdog: BUG: soft lockup - CPU#10 stuck for 21s! [lock_torture_wr:2337]
Message from syslogd@alp15 at Thu Jun 18 01:23:32 2015 ...
alp15 vmunix: [ 308.034389] NMI watchdog: BUG: soft lockup - CPU#6 stuck for 22s! [lock_torture_wr:2331]
Message from syslogd@alp15 at Thu Jun 18 01:23:32 2015 ...
alp15 vmunix: [ 308.034394] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [lock_torture_wr:2339]
Message from syslogd@alp15 at Thu Jun 18 01:23:32 2015 ...
alp15 vmunix: [ 308.034396] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [lock_torture_wr:2346]
Message from syslogd@alp15 at Thu Jun 18 01:23:32 2015 ...
alp15 vmunix: [ 308.034398] NMI watchdog: BUG: soft lockup - CPU#7 stuck for 21s! [lock_torture_wr:2334]
Message from syslogd@alp15 at Thu Jun 18 01:23:32 2015 ...
alp15 vmunix: [ 308.034410] NMI watchdog: BUG: soft lockup - CPU#11 stuck for 22s! [lock_torture_wr:2321]
Message from syslogd@alp15 at Thu Jun 18 01:23:32 2015 ...
alp15 vmunix: [ 308.034412] NMI watchdog: BUG: soft lockup - CPU#9 stuck for 22s! [lock_torture_wr:2333]
Message from syslogd@alp15 at Thu Jun 18 01:23:32 2015 ...
alp15 vmunix: [ 308.038386] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 22s! [lock_torture_wr:2327]
Stack trace output:
root@alp15:~# dmesg | more
[ 1717.146881] lock_torture_wr R running task
[ 1717.146881]
[ 1717.146885] 0 2555 2 0x00000804
[ 1717.146887] Call Trace:
[ 1717.146894] [c000000c7551b820] [c000000c7551b860] 0xc000000c7551b860 (unreliable)
[ 1717.146899] [c000000c7551b860] [c0000000000b4fb0] __do_softirq+0x220/0x3b0
[ 1717.146904] [c000000c7551b960] [c0000000000b5478] irq_exit+0x98/0x100
[ 1717.146909] [c000000c7551b980] [c00000000001fa54] timer_interrupt+0xa4/0xe0
[ 1717.146913] [c000000c7551b9b0] [c000000000002758] decrementer_common+0x158/0x180
[ 1717.146922] --- interrupt: 901 at _raw_write_lock+0x68/0xc0
[ 1717.146922] LR = torture_rwlock_write_lock+0x28/0x40 [locktorture]
[ 1717.146927] [c000000c7551bca0] [c000000c7551bcd0] 0xc000000c7551bcd0 (unreliable)
[ 1717.146934] [c000000c7551bcd0] [d00000000d4810b8] torture_rwlock_write_lock+0x28/0x40 [locktorture]
[ 1717.146939] [c000000c7551bcf0] [d00000000d480578] lock_torture_writer+0x98/0x210 [locktorture]
[ 1717.146944] [c000000c7551bd80] [c0000000000da4d4] kthread+0x114/0x140
[ 1717.146948] [c000000c7551be30] [c00000000000956c] ret_from_kernel_thread+0x5c/0x70
[ 1717.146951] Task dump for CPU 10:
[ 1717.146953] lock_torture_wr R running task 0 2537 2 0x00000804
[ 1717.146957] Call Trace:
[ 1717.146961] [c000000c7557b820] [c000000c7557b860] 0xc000000c7557b860 (unreliable)
[ 1717.146966] [c000000c7557b860] [c0000000000b4fb0] __do_softirq+0x220/0x3b0
[ 1717.146970] [c000000c7557b960] [c0000000000b5478] irq_exit+0x98/0x100
[ 1717.146975] [c000000c7557b980] [c00000000001fa54] timer_interrupt+0xa4/0xe0
[ 1717.146979] [c000000c7557b9b0] [c000000000002758] decrementer_common+0x158/0x180
[ 1717.146988] --- interrupt: 901 at _raw_write_lock+0x68/0xc0
[ 1717.146988] LR = torture_rwlock_write_lock+0x28/0x40 [locktorture]
[ 1717.146993] [c000000c7557bca0] [c000000c7557bcd0] 0xc000000c7557bcd0 (unreliable)
[ 1717.147000] [c000000c7557bcd0] [d00000000d4810b8] torture_rwlock_write_lock+0x28/0x40 [locktorture]
[ 1717.147006] [c000000c7557bcf0] [d00000000d480578] lock_torture_writer+0x98/0x210 [locktorture]
[ 1717.147013] [c000000c7557bd80] [c0000000000da4d4] kthread+0x114/0x140
[ 1717.147017] [c000000c7557be30] [c00000000000956c] ret_from_kernel_thread+0x5c/0x70
[ 1717.147020] Task dump for CPU 17:
[ 1717.147021] Task dump for CPU 2:
[ 1717.147028] lock_torture_wr R
[ 1717.147028] lock_torture_wr R running task
[ 1717.147033] running task 0 2547 2 0x00000804
[ 1717.147042] 0 2533 2 0x00000804
[ 1717.147044] Call Trace:
[ 1717.147045] Call Trace:
[ 1717.147053] [c000000c732a3820] [c000000c7f688448] 0xc000000c7f688448
[ 1717.147056] [c000000c7555f820] [c000000c7fa48448] 0xc000000c7fa48448
[ 1717.147059] (unreliable)
[ 1717.147063] (unreliable)
[ 1717.147063]
[ 1717.147067]
[ 1717.147072] Task dump for CPU 18:
[ 1717.147073] Task dump for CPU 7:
[ 1717.147077] lock_torture_wr R running task
[ 1717.147082] lock_torture_wr R 0 2555 2 0x00000804
[ 1717.147088] running task
[ 1717.147088] Call Trace:
[ 1717.147096] [c000000c7551b820] [c000000c7551b860] 0xc000000c7551b860
[ 1717.147096] 0 2559 2 0x00000804
[ 1717.147102] Call Trace:
[ 1717.147105] (unreliable)
It is possible that we are missing this commit that fixes a deadlock
during these tests:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit?id=f548d99ef4f5ec8f7080e88ad07c44d16d058ddc
will check the Ubuntu source shortly as see if this is the case and we
can suggest building a kernel to see if it helps.
The apt-get source linux-image- on the test system didn't pull down
the sources but the kernel being used is close to the one used for
vivid (3.19.0-25.26) so I pulled down the git source tree for it with
git clone git://kernel.ubuntu.com/ubuntu/ubuntu-vivid.git and the
resulting source shows that the patch for the commit mentioned is not
applied.
As I basically understand it, the problem that was fixed is that while
torture_rwlock_read_lock_irq() acquires a read lock on the lock
called:
torture_rwlock
anything that calls the counterpart torture_rwlock_read_unlock_irq()
to relinquish the read lock instead ends doing a
write_unlock_irqrestore() on the torture_rwlock() in essence leaving
the read lock. So when the locktorture module calls something like
torture_rwlock_write_lock() as we see in the bug description, it will
block indefinitely as there is at least one lock reader.
I'll go ahead and mirror this since I pretty confident this is the
issue (also should affect Vivid).
We'll have to figure out how to get the sources for the LTS kernel to
build a test kernel as well.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1483343/+subscriptions