kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #109474
[Bug 1413540] Re: Trusty soft lockup issues with nested KVM
This does not appear to be specific to OpenStack, nor tempest. I've
reproduced with Trusty on Trusty on Trusty, vanilla qemu/kvm.
Simplified reproducer, with an existing MAAS cluster:
@L0 baremetal:
- Create a Trusty bare metal host from daily images.
- sudo apt-get update -y && sudo apt-get -y install uvtool
- sudo uvt-simplestreams-libvirt sync release=trusty arch=amd64
- sudo uvt-simplestreams-libvirt query
- ssh-keygen
- sudo uvt-kvm create --memory 2048 trusty-vm release=trusty
- sudo virsh shutdown trusty-vm
- # edit the /etc/libvirt/qemu/trusty-vm.xml to enable serial console dump to file:
<serial type='file'>
<source path='/tmp/trusty-vm-console.log'/>
<target port='0'/>
</serial>
<console type='file'>
<source path='/tmp/trusty-vm-console.log'/>
<target type='serial' port='0'/>
</console>
- sudo virsh define /etc/libvirt/qemu/trusty-vm.xml
- sudo virsh start trusty-vm
- # confirm console output:
- sudo tailf /tmp/trusty-vm-console.log
- # take note of the VM's IP:
- sudo uvt-kvm ip trusty-vm
- # ssh into the new vm.
@L1 "trusty-vm":
- sudo apt-get update -y && sudo apt-get -y install uvtool
- sudo uvt-simplestreams-libvirt sync release=trusty arch=amd64
- sudo uvt-simplestreams-libvirt query
- ssh-keygen
- # change .122. to .123. in /etc/libvirt/qemu/networks/default.xml
- # make sure default.xml is static linked inside /etc/libvirt/qemu/networks
- sudo reboot # for good measure
- sudo uvt-kvm create --memory 768 trusty-nest release=trusty
- # take note of the nested VM's IP
- sudo uvt-kvm ip trusty-vm
- # ssh into the new vm.
@L2 "trusty-nest":
- sudo apt-get update && sudo apt-get install stress
- stress -c 1 -i 1 -m 1 -d 1 -t 600
Now watch the "trusty-vm" console for: [ 496.076004] BUG: soft lockup
- CPU#0 stuck for 23s! [ksmd:36]. It happens to me within a couple of
minutes. Then, both L1 and L2 become unreachable indefinitely, with two
cores on L0 stuck at 100%.
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1413540
Title:
Trusty soft lockup issues with nested KVM
Status in linux package in Ubuntu:
Confirmed
Bug description:
[Impact]
Users of nested KVM for testing openstack have soft lockups as follows:
PID: 22262 TASK: ffff8804274bb000 CPU: 1 COMMAND: "qemu-system-x86"
#0 [ffff88043fd03d18] machine_kexec at ffffffff8104ac02
#1 [ffff88043fd03d68] crash_kexec at ffffffff810e7203
#2 [ffff88043fd03e30] panic at ffffffff81719ff4
#3 [ffff88043fd03ea8] watchdog_timer_fn at ffffffff8110d7c5
#4 [ffff88043fd03ed8] __run_hrtimer at ffffffff8108e787
#5 [ffff88043fd03f18] hrtimer_interrupt at ffffffff8108ef4f
#6 [ffff88043fd03f80] local_apic_timer_interrupt at ffffffff81043537
#7 [ffff88043fd03f98] smp_apic_timer_interrupt at ffffffff81733d4f
#8 [ffff88043fd03fb0] apic_timer_interrupt at ffffffff817326dd
--- <IRQ stack> ---
#9 [ffff880426f0d958] apic_timer_interrupt at ffffffff817326dd
[exception RIP: generic_exec_single+130]
RIP: ffffffff810dbe62 RSP: ffff880426f0da00 RFLAGS: 00000202
RAX: 0000000000000002 RBX: ffff880426f0d9d0 RCX: 0000000000000001
RDX: ffffffff8180ad60 RSI: 0000000000000000 RDI: 0000000000000286
RBP: ffff880426f0da30 R8: ffffffff8180ad48 R9: ffff88042713bc68
R10: 00007fe7d1f2dbd0 R11: 0000000000000206 R12: ffff8804274bb000
R13: 0000000000000000 R14: ffff880407670280 R15: 0000000000000000
ORIG_RAX: ffffffffffffff10 CS: 0010 SS: 0018
#10 [ffff880426f0da38] smp_call_function_single at ffffffff810dbf75
#11 [ffff880426f0dab0] smp_call_function_many at ffffffff810dc3a6
#12 [ffff880426f0db10] native_flush_tlb_others at ffffffff8105c8f7
#13 [ffff880426f0db38] flush_tlb_mm_range at ffffffff8105c9cb
#14 [ffff880426f0db68] pmdp_splitting_flush at ffffffff8105b80d
#15 [ffff880426f0db88] __split_huge_page at ffffffff811ac90b
#16 [ffff880426f0dc20] split_huge_page_to_list at ffffffff811acfb8
#17 [ffff880426f0dc48] __split_huge_page_pmd at ffffffff811ad956
#18 [ffff880426f0dcc8] unmap_page_range at ffffffff8117728d
#19 [ffff880426f0dda0] unmap_single_vma at ffffffff81177341
#20 [ffff880426f0ddd8] zap_page_range at ffffffff811784cd
#21 [ffff880426f0de90] sys_madvise at ffffffff81174fbf
#22 [ffff880426f0df80] system_call_fastpath at ffffffff8173196d
RIP: 00007fe7ca2cc647 RSP: 00007fe7be9febf0 RFLAGS: 00000293
RAX: 000000000000001c RBX: ffffffff8173196d RCX: ffffffffffffffff
RDX: 0000000000000004 RSI: 00000000007fb000 RDI: 00007fe7be1ff000
RBP: 0000000000000000 R8: 0000000000000000 R9: 00007fe7d1cd2738
R10: 00007fe7d1f2dbd0 R11: 0000000000000206 R12: 00007fe7be9ff700
R13: 00007fe7be9ff9c0 R14: 0000000000000000 R15: 0000000000000000
ORIG_RAX: 000000000000001c CS: 0033 SS: 002b
[Test Case]
- Deploy openstack on openstack
- Run tempest on L1 cloud
- Check kernel log of L1 nova-compute nodes
(Although this may not necessarily be related to nested KVM)
Potentially related: https://lkml.org/lkml/2014/11/14/656
--
Original Description:
When installing qemu-kvm on a VM, KSM is enabled.
I have encountered this problem in trusty:$ lsb_release -a
Distributor ID: Ubuntu
Description: Ubuntu 14.04.1 LTS
Release: 14.04
Codename: trusty
$ uname -a
Linux juju-gema-machine-2 3.13.0-40-generic #69-Ubuntu SMP Thu Nov 13 17:53:56 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
The way to see the behaviour:
1) $ more /sys/kernel/mm/ksm/run
0
2) $ sudo apt-get install qemu-kvm
3) $ more /sys/kernel/mm/ksm/run
1
To see the soft lockups, deploy a cloud on a virtualised env like ctsstack, run tempest on it, the compute nodes of the virtualised deployment will eventually stop responding with (run tempest 2 times at least):
24096.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
[24124.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
[24152.072002] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
[24180.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
[24208.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
[24236.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
[24264.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
I am not sure whether the problem is that we are enabling KSM on a VM
or the problem is that nested KSM is not behaving properly. Either way
I can easily reproduce, please contact me if you need further details.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1413540/+subscriptions