← Back to team overview

kernel-packages team mailing list archive

[Bug 1379340] Re: qemu-kvm guest panic for AMD smp trusty guests

 

I think this patch fixes the issue:

https://lkml.org/lkml/2014/9/22/211

Looking at the stacktrace:

[    4.690909] divide error: 0000 [#1] SMP 
[    4.690909] Modules linked in: dm_crypt kvm_amd kvm serio_raw isofs crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse floppy
[    4.690909] CPU: 0 PID: 663 Comm: cloud-init Not tainted 3.13.0-40-generic #69-Ubuntu
[    4.690909] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[    4.690909] task: ffff88001f373000 ti: ffff88001460a000 task.ti: ffff88001460a000
[    4.690909] RIP: 0010:[<ffffffff8104ed58>]  [<ffffffff8104ed58>] kvm_unlock_kick+0xa8/0x100
[    4.690909] RSP: 0000:ffff88001fc03df0  EFLAGS: 00010046
[    4.690909] RAX: 0000000000000005 RBX: 0000000000000000 RCX: 0000000000000001
[    4.690909] RDX: ffffffff81eb1448 RSI: 0000000000000000 RDI: 0000000000000000
[    4.690909] RBP: ffff88001fc03e10 R08: ffffffff81eb1440 R09: ffff880016000000
[    4.690909] R10: 0000000000000006 R11: 561488f3089a6867 R12: ffffffff81fc66c0
[    4.690909] R13: 0000000000000802 R14: 0000000000000001 R15: 00000000000000c2
[    4.690909] FS:  00007fc269f46740(0000) GS:ffff88001fc00000(0000) knlGS:0000000000000000
[    4.690909] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    4.690909] CR2: 00007fc2665de050 CR3: 000000001f50f000 CR4: 00000000000406f0
[    4.690909] Stack:
[    4.690909]  0000000000000046 0000000000000060 0000000000000046 0000000000000020
[    4.690909]  ffff88001fc03e20 ffffffff81718b53 ffff88001fc03e38 ffffffff817270da
[    4.690909]  ffffffff81fc66c0 ffff88001fc03e70 ffffffff8146de04 ffffffff81fc66c0
[    4.690909] Call Trace:
[    4.690909]  <IRQ> 
[    4.690909]  [<ffffffff81718b53>] __ticket_unlock_slowpath+0x24/0x34
[    4.690909]  [<ffffffff817270da>] _raw_spin_unlock_irqrestore+0x3a/0x40
[    4.690909]  [<ffffffff8146de04>] serial8250_handle_irq.part.14+0x84/0xb0
[    4.690909]  [<ffffffff8146de77>] serial8250_default_handle_irq+0x27/0x30
[    4.690909]  [<ffffffff8146ce73>] serial8250_interrupt+0x63/0xe0
[    4.690909]  [<ffffffff810bf97e>] handle_irq_event_percpu+0x3e/0x1d0
[    4.690909]  [<ffffffff810bfb4d>] handle_irq_event+0x3d/0x60
[    4.690909]  [<ffffffff810c25d7>] handle_edge_irq+0x77/0x130
[    4.690909]  [<ffffffff81015dbe>] handle_irq+0x1e/0x30
[    4.690909]  [<ffffffff8173205d>] do_IRQ+0x4d/0xc0
[    4.690909]  [<ffffffff8172772d>] common_interrupt+0x6d/0x6d
[    4.690909]  <EOI> 
[    4.690909] Code: 66 44 39 e8 75 bd 0f b6 35 36 27 e6 00 40 84 f6 75 2a 83 05 46 27 e6 00 01 48 c7 c0 8a b0 00 00 31 db 0f b7 0c 01 b8 05 00 00 00 <0f> 01 c1 0f 1f 44 00 00 5b 41 5c 41 5d 41 5e 5d c3 89 f0 31 c9 
[    4.690909] RIP  [<ffffffff8104ed58>] kvm_unlock_kick+0xa8/0x100
[    4.690909]  RSP <ffff88001fc03df0>

Looking at the objdump we see we get a Divide Error on a vmcall instruction.
In addition we build our kernels with CONFIG_DEBUG_RODATA and PV locking.

static void kvm_kick_cpu(int cpu)
{
        int apicid;
        unsigned long flags = 0;

        apicid = per_cpu(x86_cpu_to_apicid, cpu);
ffffffff8104ed46:       48 c7 c0 8a b0 00 00    mov    $0xb08a,%rax

static inline long kvm_hypercall2(unsigned int nr, unsigned long p1,
                                  unsigned long p2)
{
        long ret;
        asm volatile(KVM_HYPERCALL
ffffffff8104ed4d:       31 db                   xor    %ebx,%ebx
        kvm_hypercall2(KVM_HC_KICK_CPU, flags, apicid);
ffffffff8104ed4f:       0f b7 0c 01             movzwl (%rcx,%rax,1),%ecx
ffffffff8104ed53:       b8 05 00 00 00          mov    $0x5,%eax
ffffffff8104ed58:       0f 01 c1                vmcall
ffffffff8104ed5b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
                        add_stats(RELEASED_SLOW_KICKED, 1);
                        kvm_kick_cpu(cpu);
                        break;
                }
        }
}

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1379340

Title:
  qemu-kvm guest panic for AMD smp trusty guests

Status in “linux” package in Ubuntu:
  Fix Released
Status in “linux” source package in Trusty:
  New
Status in “linux” source package in Utopic:
  New

Bug description:
  Just upgraded OpenStack compute hosts in our public cloud (using qemu-
  kvm via libvirt) from Precise to Trusty (14.04.1), now on kernel
  3.13.0-36-generic with qemu-kvm 2.0.0+dfsg-2ubuntu1.5.

  Following the upgrade, whenever we try to start an smp/multicore
  Trusty guest (existing or new), we run into this panic [1] inside the
  guest just towards the end of boot. This happens consistently for smp
  guests using the Trusty kernel (i.e., it also affects earlier Ubuntus
  using the HWE kernel from Trusty but not their native versions). I
  didn't have any other distro images to hand with 3.13.x kernels, but
  none of the others I tested were affected (in the 3.2 - 3.16 kernel
  range).

  There are scarce similar reports out there, but the one we did find
  pointed to a CPU feature as the trigger. We were running these hosts
  with libvirt cpu mode set to "host-passthrough" (so qemu starts with
  "-cpu host"), on AMD 6200 & 6300 Opteron hardware. Switching the guest
  domains to use cpu mode "host-model" instead works around the issue
  and is perfectly acceptable for most of our users.

  We have various other Intel compute hosts and they don't seem to be
  affected.

  (1)
  [ 11.256924] divide error: 0000 [#1] SMP 
  [ 11.258133] Modules linked in: kvm_amd kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd serio_raw lp parport psmouse floppy 
  [ 11.260228] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.13.0-36-generic #63-Ubuntu 
  [ 11.260228] Hardware name: OpenStack Foundation OpenStack Nova, BIOS Bochs 01/01/2011 
  [ 11.260228] task: ffffffff81c15480 ti: ffffffff81c00000 task.ti: ffffffff81c00000 
  [ 11.260228] RIP: 0010:[<ffffffff8104ed58>] [<ffffffff8104ed58>] kvm_unlock_kick+0xa8/0x100 
  [ 11.260228] RSP: 0018:ffff88023fc03c98 EFLAGS: 00010046 
  [ 11.260228] RAX: 0000000000000005 RBX: 0000000000000000 RCX: 0000000000000001 
  [ 11.260228] RDX: ffffffff81eaf408 RSI: 0000000000000000 RDI: 0000000000000000 
  [ 11.260228] RBP: ffff88023fc03cb8 R08: ffffffff81eaf400 R09: 00000000ffffffff 
  [ 11.260228] R10: ffff880037612cc0 R11: ffffea0002eb0a00 R12: ffff8800374a33c0 
  [ 11.260228] R13: 0000000000000020 R14: 0000000000000001 R15: 0000000000000286 
  [ 11.260228] FS: 00007f1e8b538740(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000 
  [ 11.260228] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b 
  [ 11.260228] CR2: 00007f1e8ae09d50 CR3: 0000000001c0e000 CR4: 00000000000406f0 
  [ 11.260228] Stack: 
  [ 11.260228] 0000000000000286 0000000000000001 0000000000000001 00000000000000c3 
  [ 11.260228] ffff88023fc03cc8 ffffffff81717ed6 ffff88023fc03ce0 ffffffff8172641a 
  [ 11.260228] ffff8800374a33c0 ffff88023fc03d18 ffffffff810aaeb0 ffff88023295e000 
  [ 11.260228] Call Trace: 
  [ 11.260228] <IRQ> 
  [ 11.260228] [<ffffffff81717ed6>] __ticket_unlock_slowpath+0x24/0x34 
  [ 11.260228] [<ffffffff8172641a>] _raw_spin_unlock_irqrestore+0x3a/0x40 
  [ 11.260228] [<ffffffff810aaeb0>] __wake_up_sync_key+0x50/0x60 
  [ 11.260228] [<ffffffff8160ca5a>] sock_def_readable+0x3a/0x70 
  [ 11.260228] [<ffffffff816fda0a>] packet_rcv+0x2fa/0x430 
  [ 11.260228] [<ffffffff816228b0>] __netif_receive_skb_core+0x360/0x840 
  [ 11.260228] [<ffffffff81622da8>] __netif_receive_skb+0x18/0x60 
  [ 11.260228] [<ffffffff81622e13>] netif_receive_skb+0x23/0x90 
  [ 11.260228] [<ffffffff815288d4>] virtnet_poll+0x4d4/0x850 
  [ 11.260228] [<ffffffff81623192>] net_rx_action+0x152/0x250 
  [ 11.260228] [<ffffffff8106cbac>] __do_softirq+0xec/0x2c0 
  [ 11.260228] [<ffffffff8106d0f5>] irq_exit+0x105/0x110 
  [ 11.260228] [<ffffffff817312d6>] do_IRQ+0x56/0xc0 
  [ 11.260228] [<ffffffff81726a6d>] common_interrupt+0x6d/0x6d 
  [ 11.260228] <EOI> 
  [ 11.260228] [<ffffffff8104f596>] ? native_safe_halt+0x6/0x10 
  [ 11.260228] [<ffffffff8101c62f>] default_idle+0x1f/0xc0 
  [ 11.260228] [<ffffffff8101cef6>] arch_cpu_idle+0x26/0x30 
  [ 11.260228] [<ffffffff810bed95>] cpu_startup_entry+0xc5/0x290 
  [ 11.260228] [<ffffffff8170ca77>] rest_init+0x77/0x80 
  [ 11.260228] [<ffffffff81d35f6b>] start_kernel+0x433/0x43e 
  [ 11.260228] [<ffffffff81d35941>] ? repair_env_string+0x5c/0x5c 
  [ 11.260228] [<ffffffff81d35120>] ? early_idt_handlers+0x120/0x120 
  [ 11.260228] [<ffffffff81d355ee>] x86_64_start_reservations+0x2a/0x2c 
  [ 11.260228] [<ffffffff81d35733>] x86_64_start_kernel+0x143/0x152 
  [ 11.260228] Code: 66 44 39 e8 75 bd 0f b6 35 f6 06 e6 00 40 84 f6 75 2a 83 05 06 07 e6 00 01 48 c7 c0 6a b0 00 00 31 db 0f b7 0c 01 b8 05 00 00 00 <0f> 01 c1 0f 1f 44 00 00 5b 41 5c 41 5d 41 5e 5d c3 89 f0 31 c9 
  [ 11.260228] RIP [<ffffffff8104ed58>] kvm_unlock_kick+0xa8/0x100 
  [ 11.260228] RSP <ffff88023fc03c98> 
  [ 11.260228] ---[ end trace f1c26ff24745b331 ]--- 
  [ 11.260228] Kernel panic - not syncing: Fatal exception in interrupt 
  [ 11.260228] Shutting down cpus with NMI

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1379340/+subscriptions