kernel-packages team mailing list archive

Thread
Date
[Bug 1239800] Re: Soft lockup when running bonnie++ only at 1600 mt/s

To: kernel-packages@xxxxxxxxxxxxxxxxxxx
From: Paolo Pisati <1239800@xxxxxxxxxxxxxxxxxx>
Date: Wed, 06 Nov 2013 10:35:16 -0000
Reply-to: Bug 1239800 <1239800@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
** Description changed:

+ SRU Justification:
+ 
+ Impact: running a test like bonnie++ makes the system instable and prone
+ to hangs.
+ 
+ Fix: apply the attached patches and recompile a kernel.
+ 
+ Test case: leave bonnie running in a loop for 24hrs.
+ 
+ --
+ 
  When bonnie++ was run in a loop, the system exhibits a hang behavior with
- "rcu_sched: self-detected stall on CPU" 
+ "rcu_sched: self-detected stall on CPU"
  The time to error can be inconsistent.  One time it took 7 hours and the next time more than 2 days.
  
  Commands to reproduce the failure:
  $ sudo apt-get install bonnie++
  $ mkdir bonnie
  $ while true; do bonnie++ -d bonnie; done &>>bonnie0.log &
  
  Stack trace:
  [237019.072290] INFO: rcu_sched self-detected stall on CPU { 1} (t=19305216 jiffies g=580389 c=580388 q=84)
  [237019.080901] CPU: 1 PID: 44 Comm: kswapd0 Tainted: GF 3.11.0-6-generic-lpae #12-Ubuntu
  [237019.088879] [<c002bc00>] (unwind_backtrace+0x0/0x138) from [<c0026f1c>] (show_stack+0x10/0x14)
  [237019.096700] [<c0026f1c>] (show_stack+0x10/0x14) from [<c05cbe50>] (dump_stack+0x74/0x90)
  [237019.104051] [<c05cbe50>] (dump_stack+0x74/0x90) from [<c00bf37c>] (rcu_check_callbacks+0x31c/0x798)
  [237019.112262] [<c00bf37c>] (rcu_check_callbacks+0x31c/0x798) from [<c00492a0>] (update_process_times+0x38/0x64)
  [237019.121254] [<c00492a0>] (update_process_times+0x38/0x64) from [<c008cdbc>] (tick_sched_handle+0x54/0x60)
  [237019.129933] [<c008cdbc>] (tick_sched_handle+0x54/0x60) from [<c008d00c>] (tick_sched_timer+0x44/0x74)
  [237019.138300] [<c008d00c>] (tick_sched_timer+0x44/0x74) from [<c005db50>] (__run_hrtimer+0x74/0x1d4)
  [237019.146433] [<c005db50>] (__run_hrtimer+0x74/0x1d4) from [<c005e6f8>] (hrtimer_interrupt+0x10c/0x2c0)
  [237019.154800] [<c005e6f8>] (hrtimer_interrupt+0x10c/0x2c0) from [<c0492e44>] (arch_timer_handler_phys+0x28/0x30)
  [237019.163871] [<c0492e44>] (arch_timer_handler_phys+0x28/0x30) from [<c00b8c2c>] (handle_percpu_devid_irq+0x6c/0x104)
  [237019.173332] [<c00b8c2c>] (handle_percpu_devid_irq+0x6c/0x104) from [<c00b54ec>] (generic_handle_irq+0x20/0x30)
  [237019.182402] [<c00b54ec>] (generic_handle_irq+0x20/0x30) from [<c0023ff4>] (handle_IRQ+0x38/0x94)
  [237019.190378] [<c0023ff4>] (handle_IRQ+0x38/0x94) from [<c0008508>] (gic_handle_irq+0x28/0x5c)
  [237019.198041] [<c0008508>] (gic_handle_irq+0x28/0x5c) from [<c05d1c00>] (__irq_svc+0x40/0x50)
  [237019.205624] Exception stack(0xee2c1c18 to 0xee2c1c60)
  [237019.210238] 1c00: 00000004 00000004
  [237019.217666] 1c20: 00000008 00000001 ee2c1c8c ca208700 ca208700 0996b000 ca208708 00000001
  [237019.225093] 1c40: 00000002 edb31300 00000003 ee2c1c60 c02f54fc c00923c8 200f0013 ffffffff
  [237019.232523] [<c05d1c00>] (__irq_svc+0x40/0x50) from [<c00923c8>] (generic_exec_single+0x6c/0x94)
  [237019.240500] [<c00923c8>] (generic_exec_single+0x6c/0x94) from [<c00924f4>] (smp_call_function_single+0x104/0x198)
  [237019.249805] [<c00924f4>] (smp_call_function_single+0x104/0x198) from [<c0029920>] (broadcast_tlb_mm_a15_erratum+0x7c/0x84)
  [237019.259812] [<c0029920>] (broadcast_tlb_mm_a15_erratum+0x7c/0x84) from [<c0029adc>] (flush_tlb_page+0x74/0xa8)
  [237019.268882] [<c0029adc>] (flush_tlb_page+0x74/0xa8) from [<c011fc8c>] (ptep_clear_flush_young+0x6c/0xd0)
  [237019.277484] [<c011fc8c>] (ptep_clear_flush_young+0x6c/0xd0) from [<c011a60c>] (page_referenced_one+0x64/0x1fc)
  [237019.286554] [<c011a60c>] (page_referenced_one+0x64/0x1fc) from [<c011c034>] (page_referenced+0xf4/0x2e4)
  [237019.295155] [<c011c034>] (page_referenced+0xf4/0x2e4) from [<c00fc410>] (shrink_active_list+0x1f0/0x35c)
  [237019.303756] [<c00fc410>] (shrink_active_list+0x1f0/0x35c) from [<c00fdadc>] (shrink_lruvec+0x32c/0x598)
  [237019.312279] [<c00fdadc>] (shrink_lruvec+0x32c/0x598) from [<c00fddb0>] (shrink_zone+0x68/0x180)
  [237019.320176] [<c00fddb0>] (shrink_zone+0x68/0x180) from [<c00fe430>] (kswapd+0x568/0x9d4)
  [237019.327527] [<c00fe430>] (kswapd+0x568/0x9d4) from [<c005aae0>] (kthread+0xa4/0xb0)
  [237019.334487] [<c005aae0>] (kthread+0xa4/0xb0) from [<c0023198>] (ret_from_fork+0x14/0x3c)
  
  Setup details:
  Quad-core A15 server nodes on Calxeda Midway hardware.
  The failure has been seen two times with DDR setting of DDR3@1600mt/s
  
- cat /proc/version_signature 
+ cat /proc/version_signature
  Ubuntu 3.11.0-12.18-generic-lpae 3.11.3
  The issue was first seen on Ubuntu 3.11.0-6.12-generic-lpae
  
  cat /etc/issue
  Ubuntu 13.04 \n \l
  
  Additional debug information attached
- --- 
+ ---
  Architecture: armhf
  DistroRelease: Ubuntu 13.04
  MarkForUpload: True
  Package: linux (not installed)
  ProcEnviron:
-  LANGUAGE=en_US:
-  TERM=vt102
-  PATH=(custom, no user)
-  LANG=en_US
-  SHELL=/bin/bash
+  LANGUAGE=en_US:
+  TERM=vt102
+  PATH=(custom, no user)
+  LANG=en_US
+  SHELL=/bin/bash
  Uname: Linux 3.11.0-12-generic-lpae armv7l
  UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1239800

Title:
  Soft lockup when running bonnie++ only at 1600 mt/s

Status in “linux” package in Ubuntu:
  Incomplete

Bug description:
  SRU Justification:

  Impact: running a test like bonnie++ makes the system instable and
  prone to hangs.

  Fix: apply the attached patches and recompile a kernel.

  Test case: leave bonnie running in a loop for 24hrs.

  --

  When bonnie++ was run in a loop, the system exhibits a hang behavior with
  "rcu_sched: self-detected stall on CPU"
  The time to error can be inconsistent.  One time it took 7 hours and the next time more than 2 days.

  Commands to reproduce the failure:
  $ sudo apt-get install bonnie++
  $ mkdir bonnie
  $ while true; do bonnie++ -d bonnie; done &>>bonnie0.log &

  Stack trace:
  [237019.072290] INFO: rcu_sched self-detected stall on CPU { 1} (t=19305216 jiffies g=580389 c=580388 q=84)
  [237019.080901] CPU: 1 PID: 44 Comm: kswapd0 Tainted: GF 3.11.0-6-generic-lpae #12-Ubuntu
  [237019.088879] [<c002bc00>] (unwind_backtrace+0x0/0x138) from [<c0026f1c>] (show_stack+0x10/0x14)
  [237019.096700] [<c0026f1c>] (show_stack+0x10/0x14) from [<c05cbe50>] (dump_stack+0x74/0x90)
  [237019.104051] [<c05cbe50>] (dump_stack+0x74/0x90) from [<c00bf37c>] (rcu_check_callbacks+0x31c/0x798)
  [237019.112262] [<c00bf37c>] (rcu_check_callbacks+0x31c/0x798) from [<c00492a0>] (update_process_times+0x38/0x64)
  [237019.121254] [<c00492a0>] (update_process_times+0x38/0x64) from [<c008cdbc>] (tick_sched_handle+0x54/0x60)
  [237019.129933] [<c008cdbc>] (tick_sched_handle+0x54/0x60) from [<c008d00c>] (tick_sched_timer+0x44/0x74)
  [237019.138300] [<c008d00c>] (tick_sched_timer+0x44/0x74) from [<c005db50>] (__run_hrtimer+0x74/0x1d4)
  [237019.146433] [<c005db50>] (__run_hrtimer+0x74/0x1d4) from [<c005e6f8>] (hrtimer_interrupt+0x10c/0x2c0)
  [237019.154800] [<c005e6f8>] (hrtimer_interrupt+0x10c/0x2c0) from [<c0492e44>] (arch_timer_handler_phys+0x28/0x30)
  [237019.163871] [<c0492e44>] (arch_timer_handler_phys+0x28/0x30) from [<c00b8c2c>] (handle_percpu_devid_irq+0x6c/0x104)
  [237019.173332] [<c00b8c2c>] (handle_percpu_devid_irq+0x6c/0x104) from [<c00b54ec>] (generic_handle_irq+0x20/0x30)
  [237019.182402] [<c00b54ec>] (generic_handle_irq+0x20/0x30) from [<c0023ff4>] (handle_IRQ+0x38/0x94)
  [237019.190378] [<c0023ff4>] (handle_IRQ+0x38/0x94) from [<c0008508>] (gic_handle_irq+0x28/0x5c)
  [237019.198041] [<c0008508>] (gic_handle_irq+0x28/0x5c) from [<c05d1c00>] (__irq_svc+0x40/0x50)
  [237019.205624] Exception stack(0xee2c1c18 to 0xee2c1c60)
  [237019.210238] 1c00: 00000004 00000004
  [237019.217666] 1c20: 00000008 00000001 ee2c1c8c ca208700 ca208700 0996b000 ca208708 00000001
  [237019.225093] 1c40: 00000002 edb31300 00000003 ee2c1c60 c02f54fc c00923c8 200f0013 ffffffff
  [237019.232523] [<c05d1c00>] (__irq_svc+0x40/0x50) from [<c00923c8>] (generic_exec_single+0x6c/0x94)
  [237019.240500] [<c00923c8>] (generic_exec_single+0x6c/0x94) from [<c00924f4>] (smp_call_function_single+0x104/0x198)
  [237019.249805] [<c00924f4>] (smp_call_function_single+0x104/0x198) from [<c0029920>] (broadcast_tlb_mm_a15_erratum+0x7c/0x84)
  [237019.259812] [<c0029920>] (broadcast_tlb_mm_a15_erratum+0x7c/0x84) from [<c0029adc>] (flush_tlb_page+0x74/0xa8)
  [237019.268882] [<c0029adc>] (flush_tlb_page+0x74/0xa8) from [<c011fc8c>] (ptep_clear_flush_young+0x6c/0xd0)
  [237019.277484] [<c011fc8c>] (ptep_clear_flush_young+0x6c/0xd0) from [<c011a60c>] (page_referenced_one+0x64/0x1fc)
  [237019.286554] [<c011a60c>] (page_referenced_one+0x64/0x1fc) from [<c011c034>] (page_referenced+0xf4/0x2e4)
  [237019.295155] [<c011c034>] (page_referenced+0xf4/0x2e4) from [<c00fc410>] (shrink_active_list+0x1f0/0x35c)
  [237019.303756] [<c00fc410>] (shrink_active_list+0x1f0/0x35c) from [<c00fdadc>] (shrink_lruvec+0x32c/0x598)
  [237019.312279] [<c00fdadc>] (shrink_lruvec+0x32c/0x598) from [<c00fddb0>] (shrink_zone+0x68/0x180)
  [237019.320176] [<c00fddb0>] (shrink_zone+0x68/0x180) from [<c00fe430>] (kswapd+0x568/0x9d4)
  [237019.327527] [<c00fe430>] (kswapd+0x568/0x9d4) from [<c005aae0>] (kthread+0xa4/0xb0)
  [237019.334487] [<c005aae0>] (kthread+0xa4/0xb0) from [<c0023198>] (ret_from_fork+0x14/0x3c)

  Setup details:
  Quad-core A15 server nodes on Calxeda Midway hardware.
  The failure has been seen two times with DDR setting of DDR3@1600mt/s

  cat /proc/version_signature
  Ubuntu 3.11.0-12.18-generic-lpae 3.11.3
  The issue was first seen on Ubuntu 3.11.0-6.12-generic-lpae

  cat /etc/issue
  Ubuntu 13.04 \n \l

  Additional debug information attached
  ---
  Architecture: armhf
  DistroRelease: Ubuntu 13.04
  MarkForUpload: True
  Package: linux (not installed)
  ProcEnviron:
   LANGUAGE=en_US:
   TERM=vt102
   PATH=(custom, no user)
   LANG=en_US
   SHELL=/bin/bash
  Uname: Linux 3.11.0-12-generic-lpae armv7l
  UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1239800/+subscriptions
References

[Bug 1239800] [NEW] Soft lockup when running bonnie++ only at 1600 mt/s
From: Pradeep, 2013-10-14