← Back to team overview

kernel-packages team mailing list archive

[Bug 1328866] [NEW] rcu_bh detected stall alway occur

 

Public bug reported:

I have many server running on Ubuntu 12.04 LTS kernel 3.2.0-23-generic,
but something strange happened in recently. [rcu_bh detected stall]  happend 
many times.Sometime kernel will die,sometime just a warning in dmesg when 
[rcu_bh detected stall] happen.

**********************************************************
server information:
kernel: Ubuntu 3.2.0-23.36-generic 3.2.14
CPU: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
Intel Corporation I350 Gigabit Network Connection

**********************************************************
Jun 11 15:44:10  [14380700.988692] INFO: rcu_bh detected stall on CPU 19 (t=0 jiffies)
Jun 11 15:44:10  [14380700.994604] sending NMI to all CPUs:
Jun 11 15:44:10  [14380701.000547] NMI backtrace for cpu 0
Jun 11 15:44:10  [14380701.006244] CPU 0 
Jun 11 15:44:10  
Jun 11 15:44:10  [14380701.006321] Modules linked in:
.........
.........
.........
un 11 15:44:16  [14380702.300237] NMI backtrace for cpu 19
Jun 11 15:44:16  [14380702.300239] CPU 19 
Jun 11 15:44:16  [14380702.300240] Modules linked in: iptable_raw netconsole configfs tcp_diag inet_diag mptctl mptbase xt_multiport nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables bnep rfcomm bluetooth vesafb ipmi_poweroff ipmi_watchdog ipmi_devintf ipmi_si ipmi_msghandler joydev sb_edac lp parport acpi_pad edac_core mac_hid ioatdma wmi usbhid hid mpt2sas scsi_transport_sas igb dca raid_class
Jun 11 15:44:16  [14380702.300366] 
Jun 11 15:44:16  [14380702.300373] Pid: 0, comm: swapper/19 Tainted: G           O 3.2.0-23-generic #36-Ubuntu /
Jun 11 15:44:16  [14380702.300390] RIP: 0010:[<ffffffff81036b8b>]  [<ffffffff81036b8b>] __x2apic_send_IPI_mask+0x15b/0x180
Jun 11 15:44:16  [14380702.300436] igb 0000:05:00.0: Detected Tx Unit Hang
Jun 11 15:44:16  [14380702.300439]   Tx Queue             <1>
Jun 11 15:44:16  [14380702.300440]   TDH                  <f4b>
Jun 11 15:44:16  [14380702.300442]   TDT                  <f4b>
Jun 11 15:44:16  [14380702.300452]   next_to_use          <f4b>
Jun 11 15:44:16  [14380702.300454]   next_to_clean        <dfd>
Jun 11 15:44:16  [14380702.300455] buffer_info[next_to_clean]
Jun 11 15:44:16  [14380702.300457]   time_stamp           <1d6b49a03>
Jun 11 15:44:16  [14380702.300467]   next_to_watch        <ffff88084a14dfe0>
Jun 11 15:44:16  [14380702.300469]   jiffies              <1d6b49b48>
Jun 11 15:44:16  [14380702.300471]   desc.status          <108201>
Jun 11 15:44:16  [14380702.300486] RSP: 0018:ffff88107fce3d38  EFLAGS: 00000087
Jun 11 15:44:16  [14380702.300500] RAX: 0000000000000100 RBX: ffff88107fcedb80 RCX: 0000000000000007
Jun 11 15:44:16  [14380702.300503] RDX: 0000000000000007 RSI: 0000000000000100 RDI: 0000000000000000
Jun 11 15:44:16  [14380702.300505] RBP: ffff88107fce3d98 R08: ffff88107fcedba0 R09: 0000000000000100
Jun 11 15:44:16  [14380702.300508] R10: 000000002b300055 R11: 0000000000000000 R12: ffff88107fc0dba0
Jun 11 15:44:16  [14380702.300510] R13: 000000000000dbc0 R14: 0000000000080000 R15: 0000000000020fff
Jun 11 15:44:16  [14380702.300513] FS:  0000000000000000(0000) GS:ffff88107fce0000(0000) knlGS:0000000000000000
Jun 11 15:44:16  [14380702.300516] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jun 11 15:44:16  [14380702.300518] CR2: 00007ff939554be0 CR3: 0000000001c05000 CR4: 00000000000406e0
Jun 11 15:44:16  [14380702.300527] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun 11 15:44:16  [14380702.300529] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jun 11 15:44:16  [14380702.300532] Process swapper/19 (pid: 0, threadinfo ffff88084c120000, task ffff88084c118000)
Jun 11 15:44:16  [14380702.300534] Stack:
Jun 11 15:44:16  [14380702.300542]  ffff88107fcee7c0 0000000000000092 000000137fce3da8 000000000000dba0
Jun 11 15:44:16  [14380702.300559]  0000010000000002 0000000000000013 ffff88107fce3db8 0000000000002710
Jun 11 15:44:16  [14380702.300582]  ffffffff81c30900 ffffffff81c30b00 ffff88107fcee7c0 0000000000000000
Jun 11 15:44:16  [14380702.300600] Call Trace:
Jun 11 15:44:16  [14380702.300617]  <IRQ> 
Jun 11 15:44:16  [14380702.300634]  [<ffffffff81036bcc>] x2apic_send_IPI_all+0x1c/0x20
Jun 11 15:44:16  [14380702.300640]  [<ffffffff810326a1>] arch_trigger_all_cpu_backtrace+0x61/0xa0
Jun 11 15:44:16  [14380702.300655]  [<ffffffff810e0877>] check_cpu_stall.isra.36+0x97/0xf0
Jun 11 15:44:16  [14380702.300667]  [<ffffffff810e0908>] __rcu_pending+0x38/0x1b0
Jun 11 15:44:16  [14380702.300677]  [<ffffffff810e0ecb>] rcu_check_callbacks+0x1cb/0x1e0
Jun 11 15:44:16  [14380702.300701]  [<ffffffff81078928>] update_process_times+0x48/0x90
Jun 11 15:44:16  [14380702.300721]  [<ffffffff8109c4b4>] tick_sched_timer+0x64/0xc0
Jun 11 15:44:16  [14380702.300748]  [<ffffffff8108eba8>] __run_hrtimer+0x78/0x1f0
Jun 11 15:44:16  [14380702.300771]  [<ffffffff8109c450>] ? tick_nohz_handler+0x100/0x100
Jun 11 15:44:16  [14380702.300787]  [<ffffffff8108f433>] hrtimer_interrupt+0xe3/0x200
Jun 11 15:44:16  [14380702.300817]  [<ffffffff81667689>] smp_apic_timer_interrupt+0x69/0x99
Jun 11 15:44:16  [14380702.300830]  [<ffffffff8166555e>] apic_timer_interrupt+0x6e/0x80
Jun 11 15:44:16  [14380702.300842]  <EOI> 
Jun 11 15:44:16  [14380702.300886]  [<ffffffff81056c9c>] ? update_shares+0xcc/0x100
Jun 11 15:44:16  [14380702.300900]  [<ffffffff8136b42d>] ? intel_idle+0xed/0x150
Jun 11 15:44:16  [14380702.300910]  [<ffffffff8136b40f>] ? intel_idle+0xcf/0x150
Jun 11 15:44:16  [14380702.300925]  [<ffffffff81504a01>] cpuidle_idle_call+0xc1/0x280
Jun 11 15:44:16  [14380702.300947]  [<ffffffff8101222a>] cpu_idle+0xca/0x120
Jun 11 15:44:16  [14380702.300970]  [<ffffffff8163a7fa>] start_secondary+0xd9/0xdb
Jun 11 15:44:16  [14380702.300986] Code: 80 cc 04 83 7d c0 02 0f 44 f0 89 f6 e8 0f 64 00 00 66 90 b9 00 01 00 00 4c 89 e2 48 89 de 48 89 df e8 9a 36 2e 00 e9 2d ff ff ff <48> 8b 7d a8 57 9d 66 66 90 66 90 48 83 c4 38 5b 41 5c 41 5d 41 
Jun 11 15:44:16  [14380702.301061] Call Trace:
Jun 11 15:44:16  [14380702.301064]  <IRQ>  [<ffffffff81036bcc>] x2apic_send_IPI_all+0x1c/0x20
Jun 11 15:44:16  [14380702.301123]  [<ffffffff810326a1>] arch_trigger_all_cpu_backtrace+0x61/0xa0
Jun 11 15:44:16  [14380702.301156]  [<ffffffff810e0877>] check_cpu_stall.isra.36+0x97/0xf0
Jun 11 15:44:16  [14380702.301170]  [<ffffffff810e0908>] __rcu_pending+0x38/0x1b0
Jun 11 15:44:16  [14380702.301189]  [<ffffffff810e0ecb>] rcu_check_callbacks+0x1cb/0x1e0
Jun 11 15:44:16  [14380702.301234]  [<ffffffff81078928>] update_process_times+0x48/0x90
Jun 11 15:44:16  [14380702.301253]  [<ffffffff8109c4b4>] tick_sched_timer+0x64/0xc0
Jun 11 15:44:16  [14380702.301267]  [<ffffffff8108eba8>] __run_hrtimer+0x78/0x1f0
Jun 11 15:44:16  [14380702.301278]  [<ffffffff8109c450>] ? tick_nohz_handler+0x100/0x100
Jun 11 15:44:16  [14380702.301303]  [<ffffffff8108f433>] hrtimer_interrupt+0xe3/0x200
Jun 11 15:44:16  [14380702.301319]  [<ffffffff81667689>] smp_apic_timer_interrupt+0x69/0x99
Jun 11 15:44:16  [14380702.301329]  [<ffffffff8166555e>] apic_timer_interrupt+0x6e/0x80
Jun 11 15:44:16  [14380702.301335]  <EOI>  [<ffffffff81056c9c>] ? update_shares+0xcc/0x100
Jun 11 15:44:16  [14380702.301365]  [<ffffffff8136b42d>] ? intel_idle+0xed/0x150
Jun 11 15:44:16  [14380702.301381]  [<ffffffff8136b40f>] ? intel_idle+0xcf/0x150
Jun 11 15:44:16  [14380702.301392]  [<ffffffff81504a01>] cpuidle_idle_call+0xc1/0x280
Jun 11 15:44:16  [14380702.301407]  [<ffffffff8101222a>] cpu_idle+0xca/0x120
Jun 11 15:44:16  [14380702.301430]  [<ffffffff8163a7fa>] start_secondary+0xd9/0xdb

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New

** Attachment added: "kern_bak.log"
   https://bugs.launchpad.net/bugs/1328866/+attachment/4129572/+files/kern_bak.log

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1328866

Title:
  rcu_bh detected stall alway occur

Status in “linux” package in Ubuntu:
  New

Bug description:
  I have many server running on Ubuntu 12.04 LTS kernel 3.2.0-23-generic,
  but something strange happened in recently. [rcu_bh detected stall]  happend 
  many times.Sometime kernel will die,sometime just a warning in dmesg when 
  [rcu_bh detected stall] happen.

  **********************************************************
  server information:
  kernel: Ubuntu 3.2.0-23.36-generic 3.2.14
  CPU: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
  Intel Corporation I350 Gigabit Network Connection

  **********************************************************
  Jun 11 15:44:10  [14380700.988692] INFO: rcu_bh detected stall on CPU 19 (t=0 jiffies)
  Jun 11 15:44:10  [14380700.994604] sending NMI to all CPUs:
  Jun 11 15:44:10  [14380701.000547] NMI backtrace for cpu 0
  Jun 11 15:44:10  [14380701.006244] CPU 0 
  Jun 11 15:44:10  
  Jun 11 15:44:10  [14380701.006321] Modules linked in:
  .........
  .........
  .........
  un 11 15:44:16  [14380702.300237] NMI backtrace for cpu 19
  Jun 11 15:44:16  [14380702.300239] CPU 19 
  Jun 11 15:44:16  [14380702.300240] Modules linked in: iptable_raw netconsole configfs tcp_diag inet_diag mptctl mptbase xt_multiport nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables bnep rfcomm bluetooth vesafb ipmi_poweroff ipmi_watchdog ipmi_devintf ipmi_si ipmi_msghandler joydev sb_edac lp parport acpi_pad edac_core mac_hid ioatdma wmi usbhid hid mpt2sas scsi_transport_sas igb dca raid_class
  Jun 11 15:44:16  [14380702.300366] 
  Jun 11 15:44:16  [14380702.300373] Pid: 0, comm: swapper/19 Tainted: G           O 3.2.0-23-generic #36-Ubuntu /
  Jun 11 15:44:16  [14380702.300390] RIP: 0010:[<ffffffff81036b8b>]  [<ffffffff81036b8b>] __x2apic_send_IPI_mask+0x15b/0x180
  Jun 11 15:44:16  [14380702.300436] igb 0000:05:00.0: Detected Tx Unit Hang
  Jun 11 15:44:16  [14380702.300439]   Tx Queue             <1>
  Jun 11 15:44:16  [14380702.300440]   TDH                  <f4b>
  Jun 11 15:44:16  [14380702.300442]   TDT                  <f4b>
  Jun 11 15:44:16  [14380702.300452]   next_to_use          <f4b>
  Jun 11 15:44:16  [14380702.300454]   next_to_clean        <dfd>
  Jun 11 15:44:16  [14380702.300455] buffer_info[next_to_clean]
  Jun 11 15:44:16  [14380702.300457]   time_stamp           <1d6b49a03>
  Jun 11 15:44:16  [14380702.300467]   next_to_watch        <ffff88084a14dfe0>
  Jun 11 15:44:16  [14380702.300469]   jiffies              <1d6b49b48>
  Jun 11 15:44:16  [14380702.300471]   desc.status          <108201>
  Jun 11 15:44:16  [14380702.300486] RSP: 0018:ffff88107fce3d38  EFLAGS: 00000087
  Jun 11 15:44:16  [14380702.300500] RAX: 0000000000000100 RBX: ffff88107fcedb80 RCX: 0000000000000007
  Jun 11 15:44:16  [14380702.300503] RDX: 0000000000000007 RSI: 0000000000000100 RDI: 0000000000000000
  Jun 11 15:44:16  [14380702.300505] RBP: ffff88107fce3d98 R08: ffff88107fcedba0 R09: 0000000000000100
  Jun 11 15:44:16  [14380702.300508] R10: 000000002b300055 R11: 0000000000000000 R12: ffff88107fc0dba0
  Jun 11 15:44:16  [14380702.300510] R13: 000000000000dbc0 R14: 0000000000080000 R15: 0000000000020fff
  Jun 11 15:44:16  [14380702.300513] FS:  0000000000000000(0000) GS:ffff88107fce0000(0000) knlGS:0000000000000000
  Jun 11 15:44:16  [14380702.300516] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  Jun 11 15:44:16  [14380702.300518] CR2: 00007ff939554be0 CR3: 0000000001c05000 CR4: 00000000000406e0
  Jun 11 15:44:16  [14380702.300527] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  Jun 11 15:44:16  [14380702.300529] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Jun 11 15:44:16  [14380702.300532] Process swapper/19 (pid: 0, threadinfo ffff88084c120000, task ffff88084c118000)
  Jun 11 15:44:16  [14380702.300534] Stack:
  Jun 11 15:44:16  [14380702.300542]  ffff88107fcee7c0 0000000000000092 000000137fce3da8 000000000000dba0
  Jun 11 15:44:16  [14380702.300559]  0000010000000002 0000000000000013 ffff88107fce3db8 0000000000002710
  Jun 11 15:44:16  [14380702.300582]  ffffffff81c30900 ffffffff81c30b00 ffff88107fcee7c0 0000000000000000
  Jun 11 15:44:16  [14380702.300600] Call Trace:
  Jun 11 15:44:16  [14380702.300617]  <IRQ> 
  Jun 11 15:44:16  [14380702.300634]  [<ffffffff81036bcc>] x2apic_send_IPI_all+0x1c/0x20
  Jun 11 15:44:16  [14380702.300640]  [<ffffffff810326a1>] arch_trigger_all_cpu_backtrace+0x61/0xa0
  Jun 11 15:44:16  [14380702.300655]  [<ffffffff810e0877>] check_cpu_stall.isra.36+0x97/0xf0
  Jun 11 15:44:16  [14380702.300667]  [<ffffffff810e0908>] __rcu_pending+0x38/0x1b0
  Jun 11 15:44:16  [14380702.300677]  [<ffffffff810e0ecb>] rcu_check_callbacks+0x1cb/0x1e0
  Jun 11 15:44:16  [14380702.300701]  [<ffffffff81078928>] update_process_times+0x48/0x90
  Jun 11 15:44:16  [14380702.300721]  [<ffffffff8109c4b4>] tick_sched_timer+0x64/0xc0
  Jun 11 15:44:16  [14380702.300748]  [<ffffffff8108eba8>] __run_hrtimer+0x78/0x1f0
  Jun 11 15:44:16  [14380702.300771]  [<ffffffff8109c450>] ? tick_nohz_handler+0x100/0x100
  Jun 11 15:44:16  [14380702.300787]  [<ffffffff8108f433>] hrtimer_interrupt+0xe3/0x200
  Jun 11 15:44:16  [14380702.300817]  [<ffffffff81667689>] smp_apic_timer_interrupt+0x69/0x99
  Jun 11 15:44:16  [14380702.300830]  [<ffffffff8166555e>] apic_timer_interrupt+0x6e/0x80
  Jun 11 15:44:16  [14380702.300842]  <EOI> 
  Jun 11 15:44:16  [14380702.300886]  [<ffffffff81056c9c>] ? update_shares+0xcc/0x100
  Jun 11 15:44:16  [14380702.300900]  [<ffffffff8136b42d>] ? intel_idle+0xed/0x150
  Jun 11 15:44:16  [14380702.300910]  [<ffffffff8136b40f>] ? intel_idle+0xcf/0x150
  Jun 11 15:44:16  [14380702.300925]  [<ffffffff81504a01>] cpuidle_idle_call+0xc1/0x280
  Jun 11 15:44:16  [14380702.300947]  [<ffffffff8101222a>] cpu_idle+0xca/0x120
  Jun 11 15:44:16  [14380702.300970]  [<ffffffff8163a7fa>] start_secondary+0xd9/0xdb
  Jun 11 15:44:16  [14380702.300986] Code: 80 cc 04 83 7d c0 02 0f 44 f0 89 f6 e8 0f 64 00 00 66 90 b9 00 01 00 00 4c 89 e2 48 89 de 48 89 df e8 9a 36 2e 00 e9 2d ff ff ff <48> 8b 7d a8 57 9d 66 66 90 66 90 48 83 c4 38 5b 41 5c 41 5d 41 
  Jun 11 15:44:16  [14380702.301061] Call Trace:
  Jun 11 15:44:16  [14380702.301064]  <IRQ>  [<ffffffff81036bcc>] x2apic_send_IPI_all+0x1c/0x20
  Jun 11 15:44:16  [14380702.301123]  [<ffffffff810326a1>] arch_trigger_all_cpu_backtrace+0x61/0xa0
  Jun 11 15:44:16  [14380702.301156]  [<ffffffff810e0877>] check_cpu_stall.isra.36+0x97/0xf0
  Jun 11 15:44:16  [14380702.301170]  [<ffffffff810e0908>] __rcu_pending+0x38/0x1b0
  Jun 11 15:44:16  [14380702.301189]  [<ffffffff810e0ecb>] rcu_check_callbacks+0x1cb/0x1e0
  Jun 11 15:44:16  [14380702.301234]  [<ffffffff81078928>] update_process_times+0x48/0x90
  Jun 11 15:44:16  [14380702.301253]  [<ffffffff8109c4b4>] tick_sched_timer+0x64/0xc0
  Jun 11 15:44:16  [14380702.301267]  [<ffffffff8108eba8>] __run_hrtimer+0x78/0x1f0
  Jun 11 15:44:16  [14380702.301278]  [<ffffffff8109c450>] ? tick_nohz_handler+0x100/0x100
  Jun 11 15:44:16  [14380702.301303]  [<ffffffff8108f433>] hrtimer_interrupt+0xe3/0x200
  Jun 11 15:44:16  [14380702.301319]  [<ffffffff81667689>] smp_apic_timer_interrupt+0x69/0x99
  Jun 11 15:44:16  [14380702.301329]  [<ffffffff8166555e>] apic_timer_interrupt+0x6e/0x80
  Jun 11 15:44:16  [14380702.301335]  <EOI>  [<ffffffff81056c9c>] ? update_shares+0xcc/0x100
  Jun 11 15:44:16  [14380702.301365]  [<ffffffff8136b42d>] ? intel_idle+0xed/0x150
  Jun 11 15:44:16  [14380702.301381]  [<ffffffff8136b40f>] ? intel_idle+0xcf/0x150
  Jun 11 15:44:16  [14380702.301392]  [<ffffffff81504a01>] cpuidle_idle_call+0xc1/0x280
  Jun 11 15:44:16  [14380702.301407]  [<ffffffff8101222a>] cpu_idle+0xca/0x120
  Jun 11 15:44:16  [14380702.301430]  [<ffffffff8163a7fa>] start_secondary+0xd9/0xdb

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1328866/+subscriptions


Follow ups

References