← Back to team overview

kernel-packages team mailing list archive

[Bug 1596941] Re: KVM deadlock on KVM guest migration with latest QEMU (mitaka) from Xenial (or Mitaka Ubuntu Cloud Archive)

 

** Tags added: canonical-bootstack

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1596941

Title:
  KVM deadlock on KVM guest migration with latest QEMU (mitaka) from
  Xenial (or Mitaka Ubuntu Cloud Archive)

Status in linux package in Ubuntu:
  In Progress

Bug description:
  It was brought to my knowledge that qemu-kvm live migration (with full
  storage copy) on Trusty + Mitaka Ubuntu Cloud Archive was broken. When
  investigating I stepped into the following situation:

  crash> sys
        KERNEL: /usr/lib/debug/boot/vmlinux-3.13.0-86-generic
      DUMPFILE: ./201606241546/dump.201606241546  [PARTIAL DUMP]
          CPUS: 4
          DATE: Fri Jun 24 15:46:39 2016
        UPTIME: 00:06:00
  LOAD AVERAGE: 1.00, 0.60, 0.26
         TASKS: 146
      NODENAME: vmqemulivefail1
       RELEASE: 3.13.0-86-generic
       VERSION: #131-Ubuntu SMP Thu May 12 23:33:13 UTC 2016
       MACHINE: x86_64  (2494 Mhz)
        MEMORY: 8 GB
         PANIC: "Kernel panic - not syncing: hung_task: blocked tasks"

  Full backtrace doesn't have anything useful since i've configured
  kernel.softlockup_panic.

  From scheduled-out tasks (and from kern.log) I was able to see that in
  more than one occasion I had the qemu process possibly dead-locked
  when dealing with asynchronous page faults:

  ## kernel 3.13

  # dump 1

  PID: 1604   TASK: ffff8800374be000  CPU: 3   COMMAND: "qemu-system-x86"
   #0 [ffff8800ba115e28] __schedule at ffffffff8172e379
   #1 [ffff8800ba115e90] schedule at ffffffff8172e859
   #2 [ffff8800ba115ea0] kvm_async_pf_task_wait at ffffffff8105060f
   #3 [ffff8800ba115f38] do_async_page_fault at ffffffff81736090
   #4 [ffff8800ba115f50] async_page_fault at ffffffff81732cd8
      RIP: 00007fb4eff0a4b3  RSP: 00007fb4713facb0  RFLAGS: 00010206
      RAX: 00007fb4cb9cf000  RBX: 00007fb4f166d8f0  RCX: 0000000000000010
      RDX: 0000000000001fff  RSI: 00007fb4cb9deff8  RDI: 4000000000000000
      RBP: 0000000000000000   R8: 0000000000000000   R9: 00000002601b0000
      R10: 00fffffffffffe00  R11: 0000000000001fff  R12: 0000000000000008
      R13: 00007fb4713fad84  R14: 00007fb4f1665290  R15: 00007fb4713fad88
      ORIG_RAX: ffffffffffffffff  CS: 0033  SS: 002b

  # dump 2

  PID: 1735   TASK: ffff8800b9bcb000  CPU: 2   COMMAND: "qemu-system-x86"
   #0 [ffff8802333c9e28] __schedule at ffffffff8172e379
   #1 [ffff8802333c9e90] schedule at ffffffff8172e859
   #2 [ffff8802333c9ea0] kvm_async_pf_task_wait at ffffffff8105060f
   #3 [ffff8802333c9f38] do_async_page_fault at ffffffff81736090
   #4 [ffff8802333c9f50] async_page_fault at ffffffff81732cd8
      RIP: 00007f631399d3b0  RSP: 00007f62912c7990  RFLAGS: 00010206
      RAX: 0000000000000000  RBX: 00007f6315f9e370  RCX: 00007f62ca714000
      RDX: 0000000032914020  RSI: 0000000000001000  RDI: 00007f62ca714000
      RBP: 00007f6315c66e40   R8: 00007f62912c7a40   R9: 00007f6315f9e3e0
      R10: 0000000000000000  R11: 0000000032914020  R12: 0000000032914020
      R13: 0000000000032914  R14: 00000000ffffffff  R15: 0000000000000000
      ORIG_RAX: ffffffffffffffff  CS: 0033  SS: 002b

  # dump 3

  PID: 1617   TASK: ffff880232834800  CPU: 3   COMMAND: "qemu-system-x86"
   #0 [ffff880232a6de28] __schedule at ffffffff8172e379
   #1 [ffff880232a6de90] schedule at ffffffff8172e859
   #2 [ffff880232a6dea0] kvm_async_pf_task_wait at ffffffff8105060f
   #3 [ffff880232a6df38] do_async_page_fault at ffffffff81736090
   #4 [ffff880232a6df50] async_page_fault at ffffffff81732cd8
      RIP: 00007f8c39e8b3b0  RSP: 00007f8bb80c9990  RFLAGS: 00010206
      RAX: 0000000000000000  RBX: 00007f8c3aeba370  RCX: 00007f8bdea18000
      RDX: 0000000022c18020  RSI: 0000000000001000  RDI: 00007f8bdea18000
      RBP: 00007f8c3ab82e40   R8: 00007f8bb80c9a40   R9: 00007f8c3aeba498
      R10: 0000000000000000  R11: 0000000022c18020  R12: 0000000022c18020
      R13: 0000000000022c18  R14: 00000000ffffffff  R15: 0000000000000000
      ORIG_RAX: ffffffffffffffff  CS: 0033  SS: 002b

  ## kernel 4.4

  # kern.log

  544 [  360.282132] INFO: task qemu-system-x86:1592 blocked for more than 120 seconds.
  545 [  360.282984]       Not tainted 4.4.0-27-generic #46~14.04.1-Ubuntu
  546 [  360.283581] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  547 [  360.284439] qemu-system-x86 D ffff8800bb833e90     0  1592      1 0x00000000
  548 [  360.284443]  ffff8800bb833e90 ffff88023151c4c0 ffff8802345eb700 ffff8800bb834000
  549 [  360.284444]  0000000000000010 ffffffff81efe6d0 000055ac8fa05520 00007f88fc7f7d88
  550 [  360.284445]  ffff8800bb833ea8 ffffffff817ed5f5 ffff8800bb833ef0 ffff8800bb833f38
  551 [  360.284447] Call Trace:
  552 [  360.284472]  [<ffffffff817ed5f5>] schedule+0x35/0x80
  553 [  360.284481]  [<ffffffff81060a93>] kvm_async_pf_task_wait+0x1a3/0x1f0
  554 [  360.284487]  [<ffffffff810bdc60>] ? prepare_to_wait_event+0xf0/0xf0
  555 [  360.284494]  [<ffffffff811fe600>] ? do_sendfile+0x360/0x380
  556 [  360.284495]  [<ffffffff81060c55>] do_async_page_fault+0x75/0x80
  557 [  360.284498]  [<ffffffff817f2fe8>] async_page_fault+0x28/0x30
  558 [  360.284500] Sending NMI to all CPUs:

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1596941/+subscriptions


References