← Back to team overview

kernel-packages team mailing list archive

[Bug 1346917] Re: Using KSM on NUMA capable machines can cause KVM guest performance and stability issues

 

This is NOT fixed by 3.13.0-33.58.  It continues to persist even with
3.13.0-65.106 (and 3.13.0-63.103).

I have around 10 VMs running but ONE in particular disconnects from the
network every hour or so.

I had this issue previously but it was initially gone on Ubuntu 14.04
LTS but had come back recently - perhaps some kernel regression ?

dmesg shows

[42524.196629] kvm: zapping shadow pages for mmio generation wraparound
[42538.140013] br0: port 2(vnet0) entered learning state
[42538.268017] br1: port 2(vnet1) entered learning state
[42553.180008] br0: topology change detected, propagating
[42553.180015] br0: port 2(vnet0) entered forwarding state
[42553.308008] br1: topology change detected, propagating
[42553.308014] br1: port 2(vnet1) entered forwarding state

(and NIC connection is gone)

It's not clear if this is just co-incidence or if this is a pointer to
the issue.

This VM is unusual in my VMs becuase it is the only one with 2 NIC
connections to br1 and br0.  All the others connect to just br0.  Those
others work OK.

Happy to try suggestions to track this down.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1346917

Title:
  Using KSM on NUMA capable machines can cause KVM guest performance and
  stability issues

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Trusty:
  Fix Released

Bug description:
  [Impact]

  When using KVM on NUMA machines, both Linux and Windows guests can
  exhibit very poor performance and potential crashes. Disabling KSM is
  a known workaround to fix this issue.

  [Fix]

  The following patch fixes the issue in our testing:
  http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=64a9a34e22896dad430e21a28ad8cb00a756fefc

  This patch is present in v3.14-rc1 and onwards.

  [Test Case]

  General test case:
  1) On a NUMA capable machine, setup the machine as a KVM hypervisor
    - lscpu should show more than 1 NUMA node
  2) Install 4 KVM VMs
  3) Run the following in another terminal to ensure that pages_shared and pages_sharing is increasing
   - watch 'tail /sys/kernel/mm/ksm/*'
  4) In another terminal run a program that continually pings each node and alerts on high latencies

  What we've observed is that in Linux guests, the ping latencies can go
  into the ~2 second range for a few pings, then return back to the <
  1ms range. (This is machine dependent.) In addition, occasionally when
  running this test with Windows guests we observe BSODs during this
  test.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1346917/+subscriptions


Follow ups

References