← Back to team overview

kernel-packages team mailing list archive

[Bug 1413540] Re: soft lockup issues with nested KVM VMs running tempest

 

Let's concentrate on the hang without KSM on this bug. I've split the
KSM enabled in nested virt issue in bug 1414153.

** Summary changed:

- issues with KSM enabled for nested KVM VMs
+ soft lockup issues with nested KVM VMs running tempest

** No longer affects: qemu (Ubuntu)

** Description changed:

+ 
+ [Impact]
+ Users of nested KVM for testing openstack have soft lockups as follows:
+ [74180.076007] BUG: soft lockup - CPU#1 stuck for 22s! [qemu-system-x86:14590]
+ <snip>
+ [74180.076007] Call Trace:
+ [74180.076007]  [<ffffffff8105c7a0>] ? leave_mm+0x80/0x80
+ [74180.076007]  [<ffffffff810dbf75>] smp_call_function_single+0xe5/0x190
+ [74180.076007]  [<ffffffff8105c7a0>] ? leave_mm+0x80/0x80
+ [74180.076007]  [<ffffffffa00c4300>] ? rmap_write_protect+0x80/0x80 [kvm]
+ [74180.076007]  [<ffffffff810dc3a6>] smp_call_function_many+0x286/0x2d0
+ [74180.076007]  [<ffffffff8105c7a0>] ? leave_mm+0x80/0x80
+ [74180.076007]  [<ffffffff8105c8f7>] native_flush_tlb_others+0x37/0x40
+ [74180.076007]  [<ffffffff8105c9cb>] flush_tlb_mm_range+0x5b/0x230
+ [74180.076007]  [<ffffffff8105b80d>] pmdp_splitting_flush+0x3d/0x50
+ [74180.076007]  [<ffffffff811ac95b>] __split_huge_page+0xdb/0x720
+ [74180.076007]  [<ffffffff811ad008>] split_huge_page_to_list+0x68/0xd0
+ [74180.076007]  [<ffffffff811ad9a6>] __split_huge_page_pmd+0x136/0x330
+ [74180.076007]  [<ffffffff8117728d>] unmap_page_range+0x7dd/0x810
+ [74180.076007]  [<ffffffffa00a66b5>] ? kvm_mmu_notifier_invalidate_range_start+0x75/0x90 [kvm]
+ [74180.076007]  [<ffffffff81177341>] unmap_single_vma+0x81/0xf0
+ [74180.076007]  [<ffffffff811784ed>] zap_page_range+0xed/0x150
+ [74180.076007]  [<ffffffff8108ed74>] ? hrtimer_start_range_ns+0x14/0x20
+ [74180.076007]  [<ffffffff81174fbf>] SyS_madvise+0x3bf/0x850
+ [74180.076007]  [<ffffffff810db841>] ? SyS_futex+0x71/0x150
+ [74180.076007]  [<ffffffff8173186d>] system_call_fastpath+0x1a/0x1f
+ 
+ [Test Case]
+ - Deploy openstack on openstack
+ - Run tempest on L1 cloud
+ - Check kernel log of L1 nova-compute nodes
+ 
+ --
+ 
+ Original Description:
+ 
  When installing qemu-kvm on a VM, KSM is enabled.
  
  I have encountered this problem in trusty:$ lsb_release -a
  Distributor ID: Ubuntu
  Description:    Ubuntu 14.04.1 LTS
  Release:        14.04
  Codename:       trusty
  $ uname -a
  Linux juju-gema-machine-2 3.13.0-40-generic #69-Ubuntu SMP Thu Nov 13 17:53:56 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
  
  The way to see the behaviour:
  1) $ more /sys/kernel/mm/ksm/run
  0
  2) $ sudo apt-get install qemu-kvm
  3) $ more /sys/kernel/mm/ksm/run
  1
  
  To see the soft lockups, deploy a cloud on a virtualised env like ctsstack, run tempest on it, the compute nodes of the virtualised deployment will eventually stop responding with (run tempest 2 times at least):
   24096.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24124.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24152.072002] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24180.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24208.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24236.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24264.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  
  I am not sure whether the problem is that we are enabling KSM on a VM or
  the problem is that nested KSM is not behaving properly. Either way I
  can easily reproduce, please contact me if you need further details.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1413540

Title:
  soft lockup issues with nested KVM VMs running tempest

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  
  [Impact]
  Users of nested KVM for testing openstack have soft lockups as follows:
  [74180.076007] BUG: soft lockup - CPU#1 stuck for 22s! [qemu-system-x86:14590]
  <snip>
  [74180.076007] Call Trace:
  [74180.076007]  [<ffffffff8105c7a0>] ? leave_mm+0x80/0x80
  [74180.076007]  [<ffffffff810dbf75>] smp_call_function_single+0xe5/0x190
  [74180.076007]  [<ffffffff8105c7a0>] ? leave_mm+0x80/0x80
  [74180.076007]  [<ffffffffa00c4300>] ? rmap_write_protect+0x80/0x80 [kvm]
  [74180.076007]  [<ffffffff810dc3a6>] smp_call_function_many+0x286/0x2d0
  [74180.076007]  [<ffffffff8105c7a0>] ? leave_mm+0x80/0x80
  [74180.076007]  [<ffffffff8105c8f7>] native_flush_tlb_others+0x37/0x40
  [74180.076007]  [<ffffffff8105c9cb>] flush_tlb_mm_range+0x5b/0x230
  [74180.076007]  [<ffffffff8105b80d>] pmdp_splitting_flush+0x3d/0x50
  [74180.076007]  [<ffffffff811ac95b>] __split_huge_page+0xdb/0x720
  [74180.076007]  [<ffffffff811ad008>] split_huge_page_to_list+0x68/0xd0
  [74180.076007]  [<ffffffff811ad9a6>] __split_huge_page_pmd+0x136/0x330
  [74180.076007]  [<ffffffff8117728d>] unmap_page_range+0x7dd/0x810
  [74180.076007]  [<ffffffffa00a66b5>] ? kvm_mmu_notifier_invalidate_range_start+0x75/0x90 [kvm]
  [74180.076007]  [<ffffffff81177341>] unmap_single_vma+0x81/0xf0
  [74180.076007]  [<ffffffff811784ed>] zap_page_range+0xed/0x150
  [74180.076007]  [<ffffffff8108ed74>] ? hrtimer_start_range_ns+0x14/0x20
  [74180.076007]  [<ffffffff81174fbf>] SyS_madvise+0x3bf/0x850
  [74180.076007]  [<ffffffff810db841>] ? SyS_futex+0x71/0x150
  [74180.076007]  [<ffffffff8173186d>] system_call_fastpath+0x1a/0x1f

  [Test Case]
  - Deploy openstack on openstack
  - Run tempest on L1 cloud
  - Check kernel log of L1 nova-compute nodes

  --

  Original Description:

  When installing qemu-kvm on a VM, KSM is enabled.

  I have encountered this problem in trusty:$ lsb_release -a
  Distributor ID: Ubuntu
  Description:    Ubuntu 14.04.1 LTS
  Release:        14.04
  Codename:       trusty
  $ uname -a
  Linux juju-gema-machine-2 3.13.0-40-generic #69-Ubuntu SMP Thu Nov 13 17:53:56 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

  The way to see the behaviour:
  1) $ more /sys/kernel/mm/ksm/run
  0
  2) $ sudo apt-get install qemu-kvm
  3) $ more /sys/kernel/mm/ksm/run
  1

  To see the soft lockups, deploy a cloud on a virtualised env like ctsstack, run tempest on it, the compute nodes of the virtualised deployment will eventually stop responding with (run tempest 2 times at least):
   24096.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24124.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
  [24152.072002] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24180.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24208.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24236.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
  [24264.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]

  I am not sure whether the problem is that we are enabling KSM on a VM
  or the problem is that nested KSM is not behaving properly. Either way
  I can easily reproduce, please contact me if you need further details.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1413540/+subscriptions