kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #186422
[Bug 1596941] [NEW] KVM deadlock on KVM guest migration with latest QEMU (mitaka) from Xenial (or Mitaka Ubuntu Cloud Archive)
Public bug reported:
It was brought to my knowledge that qemu-kvm live migration (with full
storage copy) on Trusty + Mitaka Ubuntu Cloud Archive was broken. When
investigating I stepped into the following situation:
crash> sys
KERNEL: /usr/lib/debug/boot/vmlinux-3.13.0-86-generic
DUMPFILE: ./201606241546/dump.201606241546 [PARTIAL DUMP]
CPUS: 4
DATE: Fri Jun 24 15:46:39 2016
UPTIME: 00:06:00
LOAD AVERAGE: 1.00, 0.60, 0.26
TASKS: 146
NODENAME: vmqemulivefail1
RELEASE: 3.13.0-86-generic
VERSION: #131-Ubuntu SMP Thu May 12 23:33:13 UTC 2016
MACHINE: x86_64 (2494 Mhz)
MEMORY: 8 GB
PANIC: "Kernel panic - not syncing: hung_task: blocked tasks"
Full backtrace doesn't have anything useful since i've configured
kernel.softlockup_panic.
>From scheduled-out tasks (and from kern.log) I was able to see that in
more than one occasion I had the qemu process possibly dead-locked when
dealing with asynchronous page faults:
## kernel 3.13
# dump 1
PID: 1604 TASK: ffff8800374be000 CPU: 3 COMMAND: "qemu-system-x86"
#0 [ffff8800ba115e28] __schedule at ffffffff8172e379
#1 [ffff8800ba115e90] schedule at ffffffff8172e859
#2 [ffff8800ba115ea0] kvm_async_pf_task_wait at ffffffff8105060f
#3 [ffff8800ba115f38] do_async_page_fault at ffffffff81736090
#4 [ffff8800ba115f50] async_page_fault at ffffffff81732cd8
RIP: 00007fb4eff0a4b3 RSP: 00007fb4713facb0 RFLAGS: 00010206
RAX: 00007fb4cb9cf000 RBX: 00007fb4f166d8f0 RCX: 0000000000000010
RDX: 0000000000001fff RSI: 00007fb4cb9deff8 RDI: 4000000000000000
RBP: 0000000000000000 R8: 0000000000000000 R9: 00000002601b0000
R10: 00fffffffffffe00 R11: 0000000000001fff R12: 0000000000000008
R13: 00007fb4713fad84 R14: 00007fb4f1665290 R15: 00007fb4713fad88
ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b
# dump 2
PID: 1735 TASK: ffff8800b9bcb000 CPU: 2 COMMAND: "qemu-system-x86"
#0 [ffff8802333c9e28] __schedule at ffffffff8172e379
#1 [ffff8802333c9e90] schedule at ffffffff8172e859
#2 [ffff8802333c9ea0] kvm_async_pf_task_wait at ffffffff8105060f
#3 [ffff8802333c9f38] do_async_page_fault at ffffffff81736090
#4 [ffff8802333c9f50] async_page_fault at ffffffff81732cd8
RIP: 00007f631399d3b0 RSP: 00007f62912c7990 RFLAGS: 00010206
RAX: 0000000000000000 RBX: 00007f6315f9e370 RCX: 00007f62ca714000
RDX: 0000000032914020 RSI: 0000000000001000 RDI: 00007f62ca714000
RBP: 00007f6315c66e40 R8: 00007f62912c7a40 R9: 00007f6315f9e3e0
R10: 0000000000000000 R11: 0000000032914020 R12: 0000000032914020
R13: 0000000000032914 R14: 00000000ffffffff R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b
# dump 3
PID: 1617 TASK: ffff880232834800 CPU: 3 COMMAND: "qemu-system-x86"
#0 [ffff880232a6de28] __schedule at ffffffff8172e379
#1 [ffff880232a6de90] schedule at ffffffff8172e859
#2 [ffff880232a6dea0] kvm_async_pf_task_wait at ffffffff8105060f
#3 [ffff880232a6df38] do_async_page_fault at ffffffff81736090
#4 [ffff880232a6df50] async_page_fault at ffffffff81732cd8
RIP: 00007f8c39e8b3b0 RSP: 00007f8bb80c9990 RFLAGS: 00010206
RAX: 0000000000000000 RBX: 00007f8c3aeba370 RCX: 00007f8bdea18000
RDX: 0000000022c18020 RSI: 0000000000001000 RDI: 00007f8bdea18000
RBP: 00007f8c3ab82e40 R8: 00007f8bb80c9a40 R9: 00007f8c3aeba498
R10: 0000000000000000 R11: 0000000022c18020 R12: 0000000022c18020
R13: 0000000000022c18 R14: 00000000ffffffff R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b
## kernel 4.4
# kern.log
544 [ 360.282132] INFO: task qemu-system-x86:1592 blocked for more than 120 seconds.
545 [ 360.282984] Not tainted 4.4.0-27-generic #46~14.04.1-Ubuntu
546 [ 360.283581] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
547 [ 360.284439] qemu-system-x86 D ffff8800bb833e90 0 1592 1 0x00000000
548 [ 360.284443] ffff8800bb833e90 ffff88023151c4c0 ffff8802345eb700 ffff8800bb834000
549 [ 360.284444] 0000000000000010 ffffffff81efe6d0 000055ac8fa05520 00007f88fc7f7d88
550 [ 360.284445] ffff8800bb833ea8 ffffffff817ed5f5 ffff8800bb833ef0 ffff8800bb833f38
551 [ 360.284447] Call Trace:
552 [ 360.284472] [<ffffffff817ed5f5>] schedule+0x35/0x80
553 [ 360.284481] [<ffffffff81060a93>] kvm_async_pf_task_wait+0x1a3/0x1f0
554 [ 360.284487] [<ffffffff810bdc60>] ? prepare_to_wait_event+0xf0/0xf0
555 [ 360.284494] [<ffffffff811fe600>] ? do_sendfile+0x360/0x380
556 [ 360.284495] [<ffffffff81060c55>] do_async_page_fault+0x75/0x80
557 [ 360.284498] [<ffffffff817f2fe8>] async_page_fault+0x28/0x30
558 [ 360.284500] Sending NMI to all CPUs:
** Affects: linux (Ubuntu)
Importance: High
Assignee: Rafael David Tinoco (inaddy)
Status: In Progress
** Changed in: linux (Ubuntu)
Status: New => In Progress
** Changed in: linux (Ubuntu)
Importance: Undecided => High
** Changed in: linux (Ubuntu)
Assignee: (unassigned) => Rafael David Tinoco (inaddy)
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1596941
Title:
KVM deadlock on KVM guest migration with latest QEMU (mitaka) from
Xenial (or Mitaka Ubuntu Cloud Archive)
Status in linux package in Ubuntu:
In Progress
Bug description:
It was brought to my knowledge that qemu-kvm live migration (with full
storage copy) on Trusty + Mitaka Ubuntu Cloud Archive was broken. When
investigating I stepped into the following situation:
crash> sys
KERNEL: /usr/lib/debug/boot/vmlinux-3.13.0-86-generic
DUMPFILE: ./201606241546/dump.201606241546 [PARTIAL DUMP]
CPUS: 4
DATE: Fri Jun 24 15:46:39 2016
UPTIME: 00:06:00
LOAD AVERAGE: 1.00, 0.60, 0.26
TASKS: 146
NODENAME: vmqemulivefail1
RELEASE: 3.13.0-86-generic
VERSION: #131-Ubuntu SMP Thu May 12 23:33:13 UTC 2016
MACHINE: x86_64 (2494 Mhz)
MEMORY: 8 GB
PANIC: "Kernel panic - not syncing: hung_task: blocked tasks"
Full backtrace doesn't have anything useful since i've configured
kernel.softlockup_panic.
From scheduled-out tasks (and from kern.log) I was able to see that in
more than one occasion I had the qemu process possibly dead-locked
when dealing with asynchronous page faults:
## kernel 3.13
# dump 1
PID: 1604 TASK: ffff8800374be000 CPU: 3 COMMAND: "qemu-system-x86"
#0 [ffff8800ba115e28] __schedule at ffffffff8172e379
#1 [ffff8800ba115e90] schedule at ffffffff8172e859
#2 [ffff8800ba115ea0] kvm_async_pf_task_wait at ffffffff8105060f
#3 [ffff8800ba115f38] do_async_page_fault at ffffffff81736090
#4 [ffff8800ba115f50] async_page_fault at ffffffff81732cd8
RIP: 00007fb4eff0a4b3 RSP: 00007fb4713facb0 RFLAGS: 00010206
RAX: 00007fb4cb9cf000 RBX: 00007fb4f166d8f0 RCX: 0000000000000010
RDX: 0000000000001fff RSI: 00007fb4cb9deff8 RDI: 4000000000000000
RBP: 0000000000000000 R8: 0000000000000000 R9: 00000002601b0000
R10: 00fffffffffffe00 R11: 0000000000001fff R12: 0000000000000008
R13: 00007fb4713fad84 R14: 00007fb4f1665290 R15: 00007fb4713fad88
ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b
# dump 2
PID: 1735 TASK: ffff8800b9bcb000 CPU: 2 COMMAND: "qemu-system-x86"
#0 [ffff8802333c9e28] __schedule at ffffffff8172e379
#1 [ffff8802333c9e90] schedule at ffffffff8172e859
#2 [ffff8802333c9ea0] kvm_async_pf_task_wait at ffffffff8105060f
#3 [ffff8802333c9f38] do_async_page_fault at ffffffff81736090
#4 [ffff8802333c9f50] async_page_fault at ffffffff81732cd8
RIP: 00007f631399d3b0 RSP: 00007f62912c7990 RFLAGS: 00010206
RAX: 0000000000000000 RBX: 00007f6315f9e370 RCX: 00007f62ca714000
RDX: 0000000032914020 RSI: 0000000000001000 RDI: 00007f62ca714000
RBP: 00007f6315c66e40 R8: 00007f62912c7a40 R9: 00007f6315f9e3e0
R10: 0000000000000000 R11: 0000000032914020 R12: 0000000032914020
R13: 0000000000032914 R14: 00000000ffffffff R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b
# dump 3
PID: 1617 TASK: ffff880232834800 CPU: 3 COMMAND: "qemu-system-x86"
#0 [ffff880232a6de28] __schedule at ffffffff8172e379
#1 [ffff880232a6de90] schedule at ffffffff8172e859
#2 [ffff880232a6dea0] kvm_async_pf_task_wait at ffffffff8105060f
#3 [ffff880232a6df38] do_async_page_fault at ffffffff81736090
#4 [ffff880232a6df50] async_page_fault at ffffffff81732cd8
RIP: 00007f8c39e8b3b0 RSP: 00007f8bb80c9990 RFLAGS: 00010206
RAX: 0000000000000000 RBX: 00007f8c3aeba370 RCX: 00007f8bdea18000
RDX: 0000000022c18020 RSI: 0000000000001000 RDI: 00007f8bdea18000
RBP: 00007f8c3ab82e40 R8: 00007f8bb80c9a40 R9: 00007f8c3aeba498
R10: 0000000000000000 R11: 0000000022c18020 R12: 0000000022c18020
R13: 0000000000022c18 R14: 00000000ffffffff R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b
## kernel 4.4
# kern.log
544 [ 360.282132] INFO: task qemu-system-x86:1592 blocked for more than 120 seconds.
545 [ 360.282984] Not tainted 4.4.0-27-generic #46~14.04.1-Ubuntu
546 [ 360.283581] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
547 [ 360.284439] qemu-system-x86 D ffff8800bb833e90 0 1592 1 0x00000000
548 [ 360.284443] ffff8800bb833e90 ffff88023151c4c0 ffff8802345eb700 ffff8800bb834000
549 [ 360.284444] 0000000000000010 ffffffff81efe6d0 000055ac8fa05520 00007f88fc7f7d88
550 [ 360.284445] ffff8800bb833ea8 ffffffff817ed5f5 ffff8800bb833ef0 ffff8800bb833f38
551 [ 360.284447] Call Trace:
552 [ 360.284472] [<ffffffff817ed5f5>] schedule+0x35/0x80
553 [ 360.284481] [<ffffffff81060a93>] kvm_async_pf_task_wait+0x1a3/0x1f0
554 [ 360.284487] [<ffffffff810bdc60>] ? prepare_to_wait_event+0xf0/0xf0
555 [ 360.284494] [<ffffffff811fe600>] ? do_sendfile+0x360/0x380
556 [ 360.284495] [<ffffffff81060c55>] do_async_page_fault+0x75/0x80
557 [ 360.284498] [<ffffffff817f2fe8>] async_page_fault+0x28/0x30
558 [ 360.284500] Sending NMI to all CPUs:
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1596941/+subscriptions
Follow ups