kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #109854
[Bug 1413540] Re: Trusty soft lockup issues with nested KVM
Hrmn... When I repeated the setup I seem to have triggered some kind of
lockup even while bringing up l2. Of course hard to say without details
of Ryan's dump. However mine seems to have backtraces in the log which
remind me an awful lot of an issue related to punching holes into ext4
based qcow images. Chris had been working on something like this
before... He is on a sprint this week. Anyway, my strace in the log:
[ 1200.288031] INFO: task qemu-system-x86:4545 blocked for more than 120 seconds.
[ 1200.288712] Not tainted 3.13.0-46-generic #77-Ubuntu
[ 1200.289204] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1200.289892] qemu-system-x86 D ffff88007fc134c0 0 4545 1 0x00000000
[ 1200.289895] ffff88007a9c5d28 0000000000000082 ffff88007bbd3000 ffff88007a9c5fd8
[ 1200.289897] 00000000000134c0 00000000000134c0 ffff88007bbd3000 ffff88007fc13d58
[ 1200.289898] ffff88007ffcdee8 0000000000000002 ffffffff8114eef0 ffff88007a9c5da0
[ 1200.289900] Call Trace:
[ 1200.289906] [<ffffffff8114eef0>] ? wait_on_page_read+0x60/0x60
[ 1200.289909] [<ffffffff817259fd>] io_schedule+0x9d/0x140
[ 1200.289910] [<ffffffff8114eefe>] sleep_on_page+0xe/0x20
[ 1200.289912] [<ffffffff81725e82>] __wait_on_bit+0x62/0x90
[ 1200.289914] [<ffffffff8114ecbf>] wait_on_page_bit+0x7f/0x90
[ 1200.289917] [<ffffffff810ab140>] ? autoremove_wake_function+0x40/0x40
[ 1200.289919] [<ffffffff8115c4a1>] ? pagevec_lookup_tag+0x21/0x30
[ 1200.289921] [<ffffffff8114edc9>] filemap_fdatawait_range+0xf9/0x190
[ 1200.289923] [<ffffffff8115066f>] filemap_write_and_wait_range+0x3f/0x70
[ 1200.289927] [<ffffffff8123bc4a>] ext4_sync_file+0xba/0x320
[ 1200.289930] [<ffffffff811ede21>] do_fsync+0x51/0x80
[ 1200.289931] [<ffffffff811ee0d3>] SyS_fdatasync+0x13/0x20
[ 1200.289933] [<ffffffff81731d7d>] system_call_fastpath+0x1a/0x1f
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1413540
Title:
Trusty soft lockup issues with nested KVM
Status in linux package in Ubuntu:
Confirmed
Bug description:
[Impact]
Users of nested KVM for testing openstack have soft lockups as follows:
PID: 22262 TASK: ffff8804274bb000 CPU: 1 COMMAND: "qemu-system-x86"
#0 [ffff88043fd03d18] machine_kexec at ffffffff8104ac02
#1 [ffff88043fd03d68] crash_kexec at ffffffff810e7203
#2 [ffff88043fd03e30] panic at ffffffff81719ff4
#3 [ffff88043fd03ea8] watchdog_timer_fn at ffffffff8110d7c5
#4 [ffff88043fd03ed8] __run_hrtimer at ffffffff8108e787
#5 [ffff88043fd03f18] hrtimer_interrupt at ffffffff8108ef4f
#6 [ffff88043fd03f80] local_apic_timer_interrupt at ffffffff81043537
#7 [ffff88043fd03f98] smp_apic_timer_interrupt at ffffffff81733d4f
#8 [ffff88043fd03fb0] apic_timer_interrupt at ffffffff817326dd
--- <IRQ stack> ---
#9 [ffff880426f0d958] apic_timer_interrupt at ffffffff817326dd
[exception RIP: generic_exec_single+130]
RIP: ffffffff810dbe62 RSP: ffff880426f0da00 RFLAGS: 00000202
RAX: 0000000000000002 RBX: ffff880426f0d9d0 RCX: 0000000000000001
RDX: ffffffff8180ad60 RSI: 0000000000000000 RDI: 0000000000000286
RBP: ffff880426f0da30 R8: ffffffff8180ad48 R9: ffff88042713bc68
R10: 00007fe7d1f2dbd0 R11: 0000000000000206 R12: ffff8804274bb000
R13: 0000000000000000 R14: ffff880407670280 R15: 0000000000000000
ORIG_RAX: ffffffffffffff10 CS: 0010 SS: 0018
#10 [ffff880426f0da38] smp_call_function_single at ffffffff810dbf75
#11 [ffff880426f0dab0] smp_call_function_many at ffffffff810dc3a6
#12 [ffff880426f0db10] native_flush_tlb_others at ffffffff8105c8f7
#13 [ffff880426f0db38] flush_tlb_mm_range at ffffffff8105c9cb
#14 [ffff880426f0db68] pmdp_splitting_flush at ffffffff8105b80d
#15 [ffff880426f0db88] __split_huge_page at ffffffff811ac90b
#16 [ffff880426f0dc20] split_huge_page_to_list at ffffffff811acfb8
#17 [ffff880426f0dc48] __split_huge_page_pmd at ffffffff811ad956
#18 [ffff880426f0dcc8] unmap_page_range at ffffffff8117728d
#19 [ffff880426f0dda0] unmap_single_vma at ffffffff81177341
#20 [ffff880426f0ddd8] zap_page_range at ffffffff811784cd
#21 [ffff880426f0de90] sys_madvise at ffffffff81174fbf
#22 [ffff880426f0df80] system_call_fastpath at ffffffff8173196d
RIP: 00007fe7ca2cc647 RSP: 00007fe7be9febf0 RFLAGS: 00000293
RAX: 000000000000001c RBX: ffffffff8173196d RCX: ffffffffffffffff
RDX: 0000000000000004 RSI: 00000000007fb000 RDI: 00007fe7be1ff000
RBP: 0000000000000000 R8: 0000000000000000 R9: 00007fe7d1cd2738
R10: 00007fe7d1f2dbd0 R11: 0000000000000206 R12: 00007fe7be9ff700
R13: 00007fe7be9ff9c0 R14: 0000000000000000 R15: 0000000000000000
ORIG_RAX: 000000000000001c CS: 0033 SS: 002b
[Test Case]
- Deploy openstack on openstack
- Run tempest on L1 cloud
- Check kernel log of L1 nova-compute nodes
(Although this may not necessarily be related to nested KVM)
Potentially related: https://lkml.org/lkml/2014/11/14/656
--
Original Description:
When installing qemu-kvm on a VM, KSM is enabled.
I have encountered this problem in trusty:$ lsb_release -a
Distributor ID: Ubuntu
Description: Ubuntu 14.04.1 LTS
Release: 14.04
Codename: trusty
$ uname -a
Linux juju-gema-machine-2 3.13.0-40-generic #69-Ubuntu SMP Thu Nov 13 17:53:56 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
The way to see the behaviour:
1) $ more /sys/kernel/mm/ksm/run
0
2) $ sudo apt-get install qemu-kvm
3) $ more /sys/kernel/mm/ksm/run
1
To see the soft lockups, deploy a cloud on a virtualised env like ctsstack, run tempest on it, the compute nodes of the virtualised deployment will eventually stop responding with (run tempest 2 times at least):
24096.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
[24124.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791]
[24152.072002] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
[24180.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
[24208.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
[24236.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
[24264.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791]
I am not sure whether the problem is that we are enabling KSM on a VM
or the problem is that nested KSM is not behaving properly. Either way
I can easily reproduce, please contact me if you need further details.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1413540/+subscriptions