← Back to team overview

kernel-packages team mailing list archive

[Bug 1467955] Re: Precise BUG: soft lockup in flush_tlb_others_ipi

 

I've been able to back port the following commit:

"""
commit 52aec3308db85f4e9f5c8b9f5dc4fbd0138c6fa4 
Author: Alex Shi <alex.shi@xxxxxxxxx> 
Date: Thu Jun 28 09:02:23 2012 +0800 

x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTOR

There are 32 INVALIDATE_TLB_VECTOR now in kernel. That is quite big 
amount of vector in IDT. But it is still not enough, since modern x86 
sever has more cpu number. That still causes heavy lock contention 
in TLB flushing. 

The patch using generic smp call function to replace it. That saved 32 
vector number in IDT, and resolved the lock contention in TLB 
flushing on large system. 

In the NHM EX machine 4P * 8cores * HT = 64 CPUs, hackbench pthread 
has 3% performance increase. 

Signed-off-by: Alex Shi <alex.shi@xxxxxxxxx> 
Link: http://lkml.kernel.org/r/1340845344-27557-9-git-send-email-alex.shi@xxxxxxxxx 
Signed-off-by: H. Peter Anvin <hpa@xxxxxxxxx> 
"""

Responsible to alter the logic for the flush_tlb_others_ipi sequence. I
also back-ported the following needed commits:

"""
commit 3a4f7b0a59006a3986b8ed6faf0031f1e5232db4 
Author: Alex Shi <alex.shi@xxxxxxxxx> 
Date: Thu Jun 28 09:02:17 2012 +0800 

x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range

commit 3331548b0d3907b1ab84e86239e149b8a52cda5d 
Author: Jan Beulich <JBeulich@xxxxxxxx> 
Date: Tue Nov 29 11:03:46 2011 +0000 

x86-64: Reduce amount of redundant code generated for invalidate_interruptNN 
"""

Right now I'm sending the source code to a kernel builder machine and
will provide a hotfixed kernel, to be tested, soon.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1467955

Title:
  Precise BUG: soft lockup in flush_tlb_others_ipi

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Precise:
  In Progress

Bug description:
  The following stack trace (with kernel dump) was brought to me:

  """
  [1796904.032010] BUG: soft lockup - CPU#0 stuck for 23s! [java:6383] 
  [1796904.036004] Modules linked in: isofs psmouse virtio_balloon serio_raw acpiphp floppy 
  [1796904.036004] CPU 0 
  [1796904.036004] Modules linked in: isofs psmouse virtio_balloon serio_raw acpiphp floppy 
  [1796904.036004] 
  [1796904.036004] Pid: 6383, comm: java Not tainted 3.2.0-76-virtual #111-Ubuntu OpenStack Foundation OpenStack Nova 
  [1796904.036004] RIP: 0010:[<ffffffff81046922>] [<ffffffff81046922>] flush_tlb_others_ipi+0x122/0x130 
  [1796904.036004] RSP: 0018:ffff880065791d58 EFLAGS: 00000202 
  [1796904.036004] RAX: 0000000000000002 RBX: ffffea0003470bf0 RCX: 0000000000000002 
  [1796904.036004] RDX: 0000000000000002 RSI: 0000000000000040 RDI: 0000000000000296 
  [1796904.036004] RBP: ffff880065791d88 R08: ffffffff81e0c0a0 R09: 0000000000000040 
  [1796904.036004] R10: ffffea0003471240 R11: 0000000000000000 R12: ffff880065791e20 
  [1796904.036004] R13: ffff880059e96f20 R14: ffff880116249848 R15: 00ff880065791d78 
  [1796904.036004] FS: 00007f83612d2700(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000 
  [1796904.036004] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
  [1796904.036004] CR2: 00007f83be381420 CR3: 0000000118be0000 CR4: 00000000000006f0 
  [1796904.036004] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 
  [1796904.036004] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 
  [1796917.981999] Process java (pid: 6383, threadinfo ffff880065790000, task ffff880053c0dbc0) 
  [1796917.981999] Stack: 
  [1796917.981999] 00007f83612ccfff ffff880059e96f20 ffff880116200e00 ffff8801162010d0 
  [1796917.981999] 00007f83612cd000 ffff880116200e00 ffff880065791d98 ffffffff81046aae 
  [1796917.981999] ffff880065791db8 ffffffff81046b7b 00007f83611d5000 ffff880065791e20 
  [1796917.981999] Call Trace: 
  [1796917.982394] ata2: lost interrupt (Status 0x58) 
  [1796917.981999] [<ffffffff81046aae>] native_flush_tlb_others+0xe/0x10 
  [1796917.981999] [<ffffffff81046b7b>] flush_tlb_mm+0x5b/0xa0 
  [1796917.981999] [<ffffffff8113ba06>] tlb_flush_mmu+0x46/0x90 
  [1796917.981999] [<ffffffff8113ba64>] tlb_finish_mmu+0x14/0x40 
  [1796917.981999] [<ffffffff8113e3a7>] zap_page_range+0xb7/0xd0 
  [1796917.981999] [<ffffffff8113a85d>] madvise_vma+0xfd/0x140 
  [1796917.981999] [<ffffffff8107b917>] ? __set_task_blocked+0x37/0x80 
  [1796917.981999] [<ffffffff81095b27>] ? getnstimeofday+0x57/0xe0 
  [1796917.981999] [<ffffffff8113aa7e>] sys_madvise+0x1de/0x280 
  [1796917.981999] [<ffffffff81666b82>] system_call_fastpath+0x16/0x1b 
  [1796917.981999] Code: 41 8d b6 cf 00 00 00 49 8d 7d 18 ff 90 d0 00 00 00 49 83 bc 24 98 c0 e0 81 00 0f 84 74 ff ff ff 66 0f 1f 84 00 00 00 00 00 f3 90 <49> 83 7d 18 00 75 f7 e9 5d ff ff ff 66 90 55 48 89 e5 66 66 66 
  [1796917.981999] Call Trace: 
  [1796917.981999] [<ffffffff81046aae>] native_flush_tlb_others+0xe/0x10 
  [1796917.981999] [<ffffffff81046b7b>] flush_tlb_mm+0x5b/0xa0 
  [1796917.981999] [<ffffffff8113ba06>] tlb_flush_mmu+0x46/0x90 
  [1796917.981999] [<ffffffff8113ba64>] tlb_finish_mmu+0x14/0x40 
  [1796917.981999] [<ffffffff8113e3a7>] zap_page_range+0xb7/0xd0 
  [1796917.981999] [<ffffffff8113a85d>] madvise_vma+0xfd/0x140 
  [1796917.981999] [<ffffffff8107b917>] ? __set_task_blocked+0x37/0x80 
  [1796917.981999] [<ffffffff81095b27>] ? getnstimeofday+0x57/0xe0 
  [1796917.981999] [<ffffffff8113aa7e>] sys_madvise+0x1de/0x280 
  [1796917.981999] [<ffffffff81666b82>] system_call_fastpath+0x16/0x1b 
  [1796917.992066] ata2: drained 65536 bytes to clear DRQ
  """

  Analysis Bellow...

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1467955/+subscriptions


References