group.of.nepali.translators team mailing list archive
-
group.of.nepali.translators team
-
Mailing list archive
-
Message #08328
[Bug 1632045] Re: KVM: PPC: Book3S HV: Migrate pinned pages out of CMA
https://lists.ubuntu.com/archives/kernel-team/2016-October/080362.html
** Also affects: linux (Ubuntu Yakkety)
Importance: High
Assignee: Canonical Kernel Team (canonical-kernel-team)
Status: Triaged
** Also affects: linux (Ubuntu Xenial)
Importance: Undecided
Status: New
** Changed in: linux (Ubuntu Yakkety)
Status: Triaged => In Progress
** Changed in: linux (Ubuntu Yakkety)
Assignee: Canonical Kernel Team (canonical-kernel-team) => Tim Gardner (timg-tpi)
--
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1632045
Title:
KVM: PPC: Book3S HV: Migrate pinned pages out of CMA
Status in linux package in Ubuntu:
In Progress
Status in linux source package in Xenial:
In Progress
Status in linux source package in Yakkety:
In Progress
Bug description:
---Problem Description---
https://github.com/open-power/supermicro-openpower/issues/59
SW/HW Configuration
PNOR image version: 5/3/2016
BMC image version: 0.25
CPLD Version: B2.81.01
Host OS version: Ubuntu 16.04 LTS
UbuntuKVM Guest OS version: Ubuntu 14.04.4 LTS
HTX version: 394
Processor: 00UL865 * 2
Memory: SK hynix 16GB 2Rx4 PC4-2133P * 16
Summary of Issue
Two UbuntuKVM guests are each configured with 8 processors, 64 GB of
memory, 1 disk of 128 GB, 1 network interface, and 1 GPU (pass-
through'd from the Host OS's K80).
The two guests are each put into a Create/Destroy loop, with HTX
running on each of the guests (NOT HOST) in between its creation and
destruction. The mdt.bu profile is used, and the processors, memory,
and the GPU are put under load. The HTX session lasts 9 minutes.
While this is running, the amount of available memory (free memory) in
the Host OS will slowly decrease, and this can continue until the
point wherein there's no more free memory for the Host OS to do
anything, including creating the two VM guests. It seems to be that
after every cycle, a small portion of the memory that was allocated to
the VM guest does not get released back to the Host OS, and
eventually, this can and will add up to take up all the available
memory in the Host OS.
At some point, the VM guest(s) might get disconnected and will display
the following error:
error: Disconnected from qemu:///system due to I/O error
error: One or more references were leaked after disconnect from
the hypervisor
Then, when the Host OS tries to start the VM guest again, the
following error shows up:
error: Failed to create domain from guest2_trusty.xml
error: internal error: early end of file from monitor, possible problem: Unexpected error in spapr_alloc_htab() at /build/qemu-c3ZrbA/qemu-2.5+dfsg/hw/ppc/spapr.c:1030:
2016-05-23T16:18:16.871549Z qemu-system-ppc64: Failed to allocate HTAB of requested size, try with smaller maxmem
The Host OS syslog, as seen HERE, also contains quite some errors.
To just list a few:
May 13 20:27:44 191-136 kernel: [36827.151228] alloc_contig_range: [3fb800, 3fd8f8) PFNs busy
May 13 20:27:44 191-136 kernel: [36827.151291] alloc_contig_range: [3fb800, 3fd8fc) PFNs busy
May 13 20:27:44 191-136 libvirtd[19263]: *** Error in `/usr/sbin/libvirtd': realloc(): invalid next size: 0x000001000a780400 ***
May 13 20:27:44 191-136 libvirtd[19263]: ======= Backtrace: =========
May 13 20:27:44 191-136 libvirtd[19263]: /lib/powerpc64le-linux-gnu/libc.so.6(+0x8720c)[0x3fffaf6a720c]
May 13 20:27:44 191-136 libvirtd[19263]: /lib/powerpc64le-linux-gnu/libc.so.6(+0x96f70)[0x3fffaf6b6f70]
May 13 20:27:44 191-136 libvirtd[19263]: /lib/powerpc64le-linux-gnu/libc.so.6(realloc+0x16c)[0x3fffaf6b87fc]
May 13 20:27:44 191-136 libvirtd[19263]: /usr/lib/powerpc64le-linux-gnu/libvirt.so.0(virReallocN+0x68)[0x3fffaf90ccc8]
May 13 20:27:44 191-136 libvirtd[19263]: /usr/lib/libvirt/connection-driver/libvirt_driver_qemu.so(+0x8ef6c)[0x3fff9346ef6c]
May 13 20:27:44 191-136 libvirtd[19263]: /usr/lib/libvirt/connection-driver/libvirt_driver_qemu.so(+0xa826c)[0x3fff9348826c]
May 13 20:27:44 191-136 libvirtd[19263]: /usr/lib/powerpc64le-linux-gnu/libvirt.so.0(virEventPollRunOnce+0x8b4)[0x3fffaf9332b4]
May 13 20:27:44 191-136 libvirtd[19263]: /usr/lib/powerpc64le-linux-gnu/libvirt.so.0(virEventRunDefaultImpl+0x54)[0x3fffaf931334]
May 13 20:27:44 191-136 libvirtd[19263]: /usr/lib/powerpc64le-linux-gnu/libvirt.so.0(virNetDaemonRun+0x1f0)[0x3fffafad2f70]
May 13 20:27:44 191-136 libvirtd[19263]: /usr/sbin/libvirtd(+0x15d74)[0x52e45d74]
May 13 20:27:44 191-136 libvirtd[19263]: /lib/powerpc64le-linux-gnu/libc.so.6(+0x2319c)[0x3fffaf64319c]
May 13 20:27:44 191-136 libvirtd[19263]: /lib/powerpc64le-linux-gnu/libc.so.6(__libc_start_main+0xb8)[0x3fffaf6433b8]
May 13 20:27:44 191-136 libvirtd[19263]: ======= Memory map: ========
May 13 20:27:44 191-136 libvirtd[19263]: 52e30000-52eb0000 r-xp 00000000 08:02 65540510 /usr/sbin/libvirtd
May 13 20:27:44 191-136 libvirtd[19263]: 52ec0000-52ed0000 r--p 00080000 08:02 65540510 /usr/sbin/libvirtd
May 13 20:27:44 191-136 libvirtd[19263]: 52ed0000-52ee0000 rw-p 00090000 08:02 65540510 /usr/sbin/libvirtd
May 13 20:27:44 191-136 libvirtd[19263]: 1000a730000-1000a830000 rw-p 00000000 00:00 0 [heap]
May 13 20:27:44 191-136 libvirtd[19263]: 3fff60000000-3fff60030000 rw-p 00000000 00:00 0
May 13 20:27:44 191-136 libvirtd[19263]: 3fff60030000-3fff64000000 ---p 00000000 00:00 0
May 13 20:50:33 191-136 kernel: [38196.502926] audit: type=1400 audit(1463197833.497:4025): apparmor="DENIED" operation="open" profile="libvirt-d3ade785-c1c1-4519-b123-9d28704c2ad4" name="/sys/devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:08.0/0003:03:00.0/devspec" pid=24887 comm="qemu-system-ppc" requested_mask="r" denied_mask="r" fsuid=110 ouid=0
May 13 20:50:33 191-136 virtlogd[3727]: End of file while reading data: Input/output error
Notes
Host OS's free memory will also slowly decrease when HTX is NOT
executed at all on the guests between guest Create/Destory, but at a
much slower pace, and VM guests can also still fail to be created,
with the same error message, and even though the Host OS might still
have plenty of free memory left:
error: Failed to create domain from guest2_trusty.xml
error: internal error: early end of file from monitor, possible problem: Unexpected error in spapr_alloc_htab() at /build/qemu-c3ZrbA/qemu-2.5+dfsg/hw/ppc/spapr.c:1030:
2016-05-23T16:18:16.871549Z qemu-system-ppc64: Failed to allocate HTAB of requested size, try with smaller maxmem
However, this happened only once so far, and after it completed about 3924 Create/Destroy cycles.
The other guest that was running the same test concurrently did NOT have any issues and went on to 4,600+ cycles.
---uname output---
Host OS version: Ubuntu 16.04 LTS UbuntuKVM Guest OS version: Ubuntu 14.04.4 LTS
Machine Type = SMC
I do not see any actual information about using all memory, here are:
1. "Failed to allocate HTAB" - happens because we run out of
_contiguous_ chunks of CMA memory, not just any RAM
2. libvirtd[19263]: *** Error in `/usr/sbin/libvirtd': realloc():
invalid next size: 0x000001000a780400 *** - this looks more like
memory corruption than insufficient memory
I suggest collecting statistics using something like this shell
script:
# !/bin/sh
while [ true ]
do
<here you put guest start/stop>
grep -e "\(CmaFree:\|MemFree:\)" /proc/meminfo | paste -d "\t" - - >> mymemorylog
done
and attaching the resulting mymemorylog to this bug. Also it would be
interesting to know if the issue can be reproduced without loaded
NVIDIA driver in the guest or even without passing NVIDIA GPU to the
guest. Meanwhile I am running my tests and see if I can get this
behavior.
Ok, located the problem, will post a patch tomorrow to the public
lists.
Basically when QEMU dies, it does unpin DMA pages when its memory
context is destroyed which was expected to happen when QEMU process
exits but actually it may happen lot later if some kernel thread was
executed on this same context and referenced it so until it was
scheduled again, the very last memory context release would not
happen.
== Comment: #15 - Leonardo Augusto Guimaraes Garcia <lagarcia@xxxxxxxxxx> - 2016-08-24 08:15:00 ==
(In reply to comment #14)
> On my host, I have 10 guests running. Sum of all 10 guests memory will come
> up to 69GB.
Ok... So, this is quite different from what is in the bug description.
In the bug description, I read:
"Two UbuntuKVM guests are each configured with 8 processors, 64 GB of
memory, 1 disk of 128 GB, 1 network interface, and 1 GPU (pass-
through'd from the Host OS's K80).
The two guests are each put into a Create/Destroy loop, with HTX
running on each of the guests (NOT HOST) in between its creation and
destruction. The mdt.bu profile is used, and the processors, memory,
and the GPU are put under load. The HTX session lasts 9 minutes."
What is the scenario being worked on this bug? I suggest you open a
new bug for your issue if needed and we continue to investigate the
original issue here.
>
> I am trying to bring up 11th guest which is having 5Gb memory and it fails:
>
> root@lotkvm:~# virsh start --console lotg12
> error: Failed to start domain lotg12
> error: internal error: process exited while connecting to monitor:
> 5076802818bda30000000000003f2,format=raw,if=none,id=drive-virtio-disk0
> -device
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,
> id=virtio-disk0,bootindex=1 -drive
> file=/dev/disk/by-id/wwn-0x6005076802818bda30000000000003f4,format=raw,
> if=none,id=drive-virtio-disk1 -device
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,
> id=virtio-disk1 -netdev tap,fd=41,id=hostnet0 -device
> virtio-net,netdev=hostnet0,id=net0,mac=52:54:00:9b:53:77,bus=pci.0,addr=0x1,
> bootindex=2 -chardev pty,id=charserial0 -device
> spapr-vty,chardev=charserial0,reg=0x30000000 -device
> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x2 -msg timestamp=on
> 2016-08-24T12:00:50.375315Z qemu-system-ppc64: Failed to allocate KVM HPT of
> order 26 (try smaller maxmem?): Cannot allocate memory
This is not because you don't have available memory. This is because
you don't have CMA memory available. Please, take a look at LTC bug
145072 comment 5 and subsequent comments.
>
>
> I waited for an hour and retried guest start.. It fails still..
>
> Current memory on host :
> -----------
> root@lotkvm:~# free -g
> total used free shared buff/cache
> available
> Mem: 127 73 0 0 53
> 53
> Swap: 11 4 6
I think there are actually two separate problems here.
(A) Pages in the CMA zone are getting pinned and causing fragmentation
of the CMA zone, leading to the messages saying "qemu-system-ppc64:
Failed to allocate HTAB of requested size, try with smaller maxmem".
This happens because the guest is doing PCI passthrough with DDW
enabled and hence pins all its memory. If guest pages happen to be
allocated in the CMA zone, they get pinned there and then can't be
moved for a future HPT allocation.
Balbir was looking at the possibility of moving the pages out of the
CMA zone before pinning them, but this work was dependent on some
upstream refactoring which seems to be stalled.
(B) On VM destruction, the pages are not getting unpinned and freed in
a timely fashion. Alexey debugged this issue and has posted two
patches to fix the problem: "powerpc/iommu: Stop using @current in
mm_iommu_xxx" and "powerpc/mm/iommu: Put pages on process exit". These
patches touch two maintainers' areas (powerpc and vfio) and hence need
two maintainers' concurrence, and thus haven't gone anywhere yet.
(Of course, issue (B) exacerbates issue (A).)
Upon moving host and guests to 4.8 kernel. Still almost whole memory
is getting used on host.
Any updates here, any patches that we can expect soon ? Please let us
know..
Thanks,
Manju
4.8 does not yet have the fix for the pinned page migrations. I am not sure of the status of https://patchwork.kernel.org/patch/9238861/ upstream. I checked to see if I could find it in any git tree, but could not. I suspect we need this fix in first.
> Balbir - Is this fixed in the latest 4.8 kernel out today?
My patch is in powerpc-next
https://git.kernel.org/cgit/linux/kernel/git/powerpc/linux.git/commit/?h=next&id=2e5bbb5461f138cac631fe21b4ad956feabfba22
Should hit 4.9 and we can backport it. I am also trying to work on
improvements to the patch for the future. Not sure of aik's patch
status
Balbir Singh.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1632045/+subscriptions