kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #26868
[Bug 1239800] Re: Soft lockup when running bonnie++ only at 1600 mt/s
** Description changed:
+ SRU Justification:
+
+ Impact: running a test like bonnie++ makes the system instable and prone
+ to hangs.
+
+ Fix: apply the attached patches and recompile a kernel.
+
+ Test case: leave bonnie running in a loop for 24hrs.
+
+ --
+
When bonnie++ was run in a loop, the system exhibits a hang behavior with
- "rcu_sched: self-detected stall on CPU"
+ "rcu_sched: self-detected stall on CPU"
The time to error can be inconsistent. One time it took 7 hours and the next time more than 2 days.
Commands to reproduce the failure:
$ sudo apt-get install bonnie++
$ mkdir bonnie
$ while true; do bonnie++ -d bonnie; done &>>bonnie0.log &
Stack trace:
[237019.072290] INFO: rcu_sched self-detected stall on CPU { 1} (t=19305216 jiffies g=580389 c=580388 q=84)
[237019.080901] CPU: 1 PID: 44 Comm: kswapd0 Tainted: GF 3.11.0-6-generic-lpae #12-Ubuntu
[237019.088879] [<c002bc00>] (unwind_backtrace+0x0/0x138) from [<c0026f1c>] (show_stack+0x10/0x14)
[237019.096700] [<c0026f1c>] (show_stack+0x10/0x14) from [<c05cbe50>] (dump_stack+0x74/0x90)
[237019.104051] [<c05cbe50>] (dump_stack+0x74/0x90) from [<c00bf37c>] (rcu_check_callbacks+0x31c/0x798)
[237019.112262] [<c00bf37c>] (rcu_check_callbacks+0x31c/0x798) from [<c00492a0>] (update_process_times+0x38/0x64)
[237019.121254] [<c00492a0>] (update_process_times+0x38/0x64) from [<c008cdbc>] (tick_sched_handle+0x54/0x60)
[237019.129933] [<c008cdbc>] (tick_sched_handle+0x54/0x60) from [<c008d00c>] (tick_sched_timer+0x44/0x74)
[237019.138300] [<c008d00c>] (tick_sched_timer+0x44/0x74) from [<c005db50>] (__run_hrtimer+0x74/0x1d4)
[237019.146433] [<c005db50>] (__run_hrtimer+0x74/0x1d4) from [<c005e6f8>] (hrtimer_interrupt+0x10c/0x2c0)
[237019.154800] [<c005e6f8>] (hrtimer_interrupt+0x10c/0x2c0) from [<c0492e44>] (arch_timer_handler_phys+0x28/0x30)
[237019.163871] [<c0492e44>] (arch_timer_handler_phys+0x28/0x30) from [<c00b8c2c>] (handle_percpu_devid_irq+0x6c/0x104)
[237019.173332] [<c00b8c2c>] (handle_percpu_devid_irq+0x6c/0x104) from [<c00b54ec>] (generic_handle_irq+0x20/0x30)
[237019.182402] [<c00b54ec>] (generic_handle_irq+0x20/0x30) from [<c0023ff4>] (handle_IRQ+0x38/0x94)
[237019.190378] [<c0023ff4>] (handle_IRQ+0x38/0x94) from [<c0008508>] (gic_handle_irq+0x28/0x5c)
[237019.198041] [<c0008508>] (gic_handle_irq+0x28/0x5c) from [<c05d1c00>] (__irq_svc+0x40/0x50)
[237019.205624] Exception stack(0xee2c1c18 to 0xee2c1c60)
[237019.210238] 1c00: 00000004 00000004
[237019.217666] 1c20: 00000008 00000001 ee2c1c8c ca208700 ca208700 0996b000 ca208708 00000001
[237019.225093] 1c40: 00000002 edb31300 00000003 ee2c1c60 c02f54fc c00923c8 200f0013 ffffffff
[237019.232523] [<c05d1c00>] (__irq_svc+0x40/0x50) from [<c00923c8>] (generic_exec_single+0x6c/0x94)
[237019.240500] [<c00923c8>] (generic_exec_single+0x6c/0x94) from [<c00924f4>] (smp_call_function_single+0x104/0x198)
[237019.249805] [<c00924f4>] (smp_call_function_single+0x104/0x198) from [<c0029920>] (broadcast_tlb_mm_a15_erratum+0x7c/0x84)
[237019.259812] [<c0029920>] (broadcast_tlb_mm_a15_erratum+0x7c/0x84) from [<c0029adc>] (flush_tlb_page+0x74/0xa8)
[237019.268882] [<c0029adc>] (flush_tlb_page+0x74/0xa8) from [<c011fc8c>] (ptep_clear_flush_young+0x6c/0xd0)
[237019.277484] [<c011fc8c>] (ptep_clear_flush_young+0x6c/0xd0) from [<c011a60c>] (page_referenced_one+0x64/0x1fc)
[237019.286554] [<c011a60c>] (page_referenced_one+0x64/0x1fc) from [<c011c034>] (page_referenced+0xf4/0x2e4)
[237019.295155] [<c011c034>] (page_referenced+0xf4/0x2e4) from [<c00fc410>] (shrink_active_list+0x1f0/0x35c)
[237019.303756] [<c00fc410>] (shrink_active_list+0x1f0/0x35c) from [<c00fdadc>] (shrink_lruvec+0x32c/0x598)
[237019.312279] [<c00fdadc>] (shrink_lruvec+0x32c/0x598) from [<c00fddb0>] (shrink_zone+0x68/0x180)
[237019.320176] [<c00fddb0>] (shrink_zone+0x68/0x180) from [<c00fe430>] (kswapd+0x568/0x9d4)
[237019.327527] [<c00fe430>] (kswapd+0x568/0x9d4) from [<c005aae0>] (kthread+0xa4/0xb0)
[237019.334487] [<c005aae0>] (kthread+0xa4/0xb0) from [<c0023198>] (ret_from_fork+0x14/0x3c)
Setup details:
Quad-core A15 server nodes on Calxeda Midway hardware.
The failure has been seen two times with DDR setting of DDR3@1600mt/s
- cat /proc/version_signature
+ cat /proc/version_signature
Ubuntu 3.11.0-12.18-generic-lpae 3.11.3
The issue was first seen on Ubuntu 3.11.0-6.12-generic-lpae
cat /etc/issue
Ubuntu 13.04 \n \l
Additional debug information attached
- ---
+ ---
Architecture: armhf
DistroRelease: Ubuntu 13.04
MarkForUpload: True
Package: linux (not installed)
ProcEnviron:
- LANGUAGE=en_US:
- TERM=vt102
- PATH=(custom, no user)
- LANG=en_US
- SHELL=/bin/bash
+ LANGUAGE=en_US:
+ TERM=vt102
+ PATH=(custom, no user)
+ LANG=en_US
+ SHELL=/bin/bash
Uname: Linux 3.11.0-12-generic-lpae armv7l
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1239800
Title:
Soft lockup when running bonnie++ only at 1600 mt/s
Status in “linux” package in Ubuntu:
Incomplete
Bug description:
SRU Justification:
Impact: running a test like bonnie++ makes the system instable and
prone to hangs.
Fix: apply the attached patches and recompile a kernel.
Test case: leave bonnie running in a loop for 24hrs.
--
When bonnie++ was run in a loop, the system exhibits a hang behavior with
"rcu_sched: self-detected stall on CPU"
The time to error can be inconsistent. One time it took 7 hours and the next time more than 2 days.
Commands to reproduce the failure:
$ sudo apt-get install bonnie++
$ mkdir bonnie
$ while true; do bonnie++ -d bonnie; done &>>bonnie0.log &
Stack trace:
[237019.072290] INFO: rcu_sched self-detected stall on CPU { 1} (t=19305216 jiffies g=580389 c=580388 q=84)
[237019.080901] CPU: 1 PID: 44 Comm: kswapd0 Tainted: GF 3.11.0-6-generic-lpae #12-Ubuntu
[237019.088879] [<c002bc00>] (unwind_backtrace+0x0/0x138) from [<c0026f1c>] (show_stack+0x10/0x14)
[237019.096700] [<c0026f1c>] (show_stack+0x10/0x14) from [<c05cbe50>] (dump_stack+0x74/0x90)
[237019.104051] [<c05cbe50>] (dump_stack+0x74/0x90) from [<c00bf37c>] (rcu_check_callbacks+0x31c/0x798)
[237019.112262] [<c00bf37c>] (rcu_check_callbacks+0x31c/0x798) from [<c00492a0>] (update_process_times+0x38/0x64)
[237019.121254] [<c00492a0>] (update_process_times+0x38/0x64) from [<c008cdbc>] (tick_sched_handle+0x54/0x60)
[237019.129933] [<c008cdbc>] (tick_sched_handle+0x54/0x60) from [<c008d00c>] (tick_sched_timer+0x44/0x74)
[237019.138300] [<c008d00c>] (tick_sched_timer+0x44/0x74) from [<c005db50>] (__run_hrtimer+0x74/0x1d4)
[237019.146433] [<c005db50>] (__run_hrtimer+0x74/0x1d4) from [<c005e6f8>] (hrtimer_interrupt+0x10c/0x2c0)
[237019.154800] [<c005e6f8>] (hrtimer_interrupt+0x10c/0x2c0) from [<c0492e44>] (arch_timer_handler_phys+0x28/0x30)
[237019.163871] [<c0492e44>] (arch_timer_handler_phys+0x28/0x30) from [<c00b8c2c>] (handle_percpu_devid_irq+0x6c/0x104)
[237019.173332] [<c00b8c2c>] (handle_percpu_devid_irq+0x6c/0x104) from [<c00b54ec>] (generic_handle_irq+0x20/0x30)
[237019.182402] [<c00b54ec>] (generic_handle_irq+0x20/0x30) from [<c0023ff4>] (handle_IRQ+0x38/0x94)
[237019.190378] [<c0023ff4>] (handle_IRQ+0x38/0x94) from [<c0008508>] (gic_handle_irq+0x28/0x5c)
[237019.198041] [<c0008508>] (gic_handle_irq+0x28/0x5c) from [<c05d1c00>] (__irq_svc+0x40/0x50)
[237019.205624] Exception stack(0xee2c1c18 to 0xee2c1c60)
[237019.210238] 1c00: 00000004 00000004
[237019.217666] 1c20: 00000008 00000001 ee2c1c8c ca208700 ca208700 0996b000 ca208708 00000001
[237019.225093] 1c40: 00000002 edb31300 00000003 ee2c1c60 c02f54fc c00923c8 200f0013 ffffffff
[237019.232523] [<c05d1c00>] (__irq_svc+0x40/0x50) from [<c00923c8>] (generic_exec_single+0x6c/0x94)
[237019.240500] [<c00923c8>] (generic_exec_single+0x6c/0x94) from [<c00924f4>] (smp_call_function_single+0x104/0x198)
[237019.249805] [<c00924f4>] (smp_call_function_single+0x104/0x198) from [<c0029920>] (broadcast_tlb_mm_a15_erratum+0x7c/0x84)
[237019.259812] [<c0029920>] (broadcast_tlb_mm_a15_erratum+0x7c/0x84) from [<c0029adc>] (flush_tlb_page+0x74/0xa8)
[237019.268882] [<c0029adc>] (flush_tlb_page+0x74/0xa8) from [<c011fc8c>] (ptep_clear_flush_young+0x6c/0xd0)
[237019.277484] [<c011fc8c>] (ptep_clear_flush_young+0x6c/0xd0) from [<c011a60c>] (page_referenced_one+0x64/0x1fc)
[237019.286554] [<c011a60c>] (page_referenced_one+0x64/0x1fc) from [<c011c034>] (page_referenced+0xf4/0x2e4)
[237019.295155] [<c011c034>] (page_referenced+0xf4/0x2e4) from [<c00fc410>] (shrink_active_list+0x1f0/0x35c)
[237019.303756] [<c00fc410>] (shrink_active_list+0x1f0/0x35c) from [<c00fdadc>] (shrink_lruvec+0x32c/0x598)
[237019.312279] [<c00fdadc>] (shrink_lruvec+0x32c/0x598) from [<c00fddb0>] (shrink_zone+0x68/0x180)
[237019.320176] [<c00fddb0>] (shrink_zone+0x68/0x180) from [<c00fe430>] (kswapd+0x568/0x9d4)
[237019.327527] [<c00fe430>] (kswapd+0x568/0x9d4) from [<c005aae0>] (kthread+0xa4/0xb0)
[237019.334487] [<c005aae0>] (kthread+0xa4/0xb0) from [<c0023198>] (ret_from_fork+0x14/0x3c)
Setup details:
Quad-core A15 server nodes on Calxeda Midway hardware.
The failure has been seen two times with DDR setting of DDR3@1600mt/s
cat /proc/version_signature
Ubuntu 3.11.0-12.18-generic-lpae 3.11.3
The issue was first seen on Ubuntu 3.11.0-6.12-generic-lpae
cat /etc/issue
Ubuntu 13.04 \n \l
Additional debug information attached
---
Architecture: armhf
DistroRelease: Ubuntu 13.04
MarkForUpload: True
Package: linux (not installed)
ProcEnviron:
LANGUAGE=en_US:
TERM=vt102
PATH=(custom, no user)
LANG=en_US
SHELL=/bin/bash
Uname: Linux 3.11.0-12-generic-lpae armv7l
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1239800/+subscriptions
References