kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #138536
[Bug 1346917] Re: Using KSM on NUMA capable machines can cause KVM guest performance and stability issues
This is NOT fixed by 3.13.0-33.58. It continues to persist even with
3.13.0-65.106 (and 3.13.0-63.103).
I have around 10 VMs running but ONE in particular disconnects from the
network every hour or so.
I had this issue previously but it was initially gone on Ubuntu 14.04
LTS but had come back recently - perhaps some kernel regression ?
dmesg shows
[42524.196629] kvm: zapping shadow pages for mmio generation wraparound
[42538.140013] br0: port 2(vnet0) entered learning state
[42538.268017] br1: port 2(vnet1) entered learning state
[42553.180008] br0: topology change detected, propagating
[42553.180015] br0: port 2(vnet0) entered forwarding state
[42553.308008] br1: topology change detected, propagating
[42553.308014] br1: port 2(vnet1) entered forwarding state
(and NIC connection is gone)
It's not clear if this is just co-incidence or if this is a pointer to
the issue.
This VM is unusual in my VMs becuase it is the only one with 2 NIC
connections to br1 and br0. All the others connect to just br0. Those
others work OK.
Happy to try suggestions to track this down.
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1346917
Title:
Using KSM on NUMA capable machines can cause KVM guest performance and
stability issues
Status in linux package in Ubuntu:
Fix Released
Status in linux source package in Trusty:
Fix Released
Bug description:
[Impact]
When using KVM on NUMA machines, both Linux and Windows guests can
exhibit very poor performance and potential crashes. Disabling KSM is
a known workaround to fix this issue.
[Fix]
The following patch fixes the issue in our testing:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=64a9a34e22896dad430e21a28ad8cb00a756fefc
This patch is present in v3.14-rc1 and onwards.
[Test Case]
General test case:
1) On a NUMA capable machine, setup the machine as a KVM hypervisor
- lscpu should show more than 1 NUMA node
2) Install 4 KVM VMs
3) Run the following in another terminal to ensure that pages_shared and pages_sharing is increasing
- watch 'tail /sys/kernel/mm/ksm/*'
4) In another terminal run a program that continually pings each node and alerts on high latencies
What we've observed is that in Linux guests, the ping latencies can go
into the ~2 second range for a few pings, then return back to the <
1ms range. (This is machine dependent.) In addition, occasionally when
running this test with Windows guests we observe BSODs during this
test.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1346917/+subscriptions
Follow ups
References