← Back to team overview

kernel-packages team mailing list archive

[Bug 1346917] [NEW] Using KSM on NUMA capable machines can cause KVM guest performance and stability issues

 

Public bug reported:

[Impact]

When using KVM on NUMA machines, both Linux and Windows guests can
exhibit very poor performance and potential crashes. Disabling KSM is a
known workaround to fix this issue.

[Fix]

The following patch fixes the issue in our testing:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=64a9a34e22896dad430e21a28ad8cb00a756fefc

This patch is present in v3.14-rc1 and onwards.

[Test Case]

General test case:
1) On a NUMA capable machine, setup the machine as a KVM hypervisor
  - lscpu should show more than 1 NUMA node
2) Install 4 KVM VMs
3) Run the following in another terminal to ensure that pages_shared and pages_sharing is increasing
 - watch 'tail /sys/kernel/mm/ksm/*'
4) In another terminal run a program that continually pings each node and alerts on high latencies

What we've observed is that in Linux guests, the ping latencies can go
into the ~2 second range for a few pings, then return back to the < 1ms
range. (This is machine dependent.) In addition, occasionally when
running this test with Windows guests we observe BSODs during this test.

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: Fix Released

** Affects: linux (Ubuntu Trusty)
     Importance: High
     Assignee: Chris J Arges (arges)
         Status: In Progress

** Description changed:

  [Impact]
  
  When using KVM on NUMA machines, both Linux and Windows guests can
  exhibit very poor performance and potential crashes. Disabling KSM is a
  known workaround to fix this issue.
  
- [ Fix ]
+ [Fix]
  
  The following patch fixes the issue in our testing:
  http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=64a9a34e22896dad430e21a28ad8cb00a756fefc
  
  This patch is present in v3.14-rc1 and onwards.
  
  [Test Case]
  
  General test case:
  1) On a NUMA capable machine, setup the machine as a KVM hypervisor
-   - lscpu should show more than 1 NUMA node 
+   - lscpu should show more than 1 NUMA node
  2) Install 4 KVM VMs
  3) Run the following in another terminal to ensure that pages_shared and pages_sharing is increasing
-  - watch 'tail /sys/kernel/mm/ksm/*' 
+  - watch 'tail /sys/kernel/mm/ksm/*'
  4) In another terminal run a program that continually pings each node and alerts on high latencies
  
  What we've observed is that in Linux guests, the ping latencies can go
  into the ~2 second range for a few pings, then return back to the < 1ms
  range. (This is machine dependent.) In addition, using Windows guests,
  occasionally when running this test we observe that the guests BSOD
  during this test.

** Also affects: linux (Ubuntu Trusty)
   Importance: Undecided
       Status: New

** Changed in: linux (Ubuntu Trusty)
     Assignee: (unassigned) => Chris J Arges (arges)

** Changed in: linux (Ubuntu)
     Assignee: Chris J Arges (arges) => (unassigned)

** Changed in: linux (Ubuntu Trusty)
   Importance: Undecided => High

** Changed in: linux (Ubuntu Trusty)
       Status: New => In Progress

** Changed in: linux (Ubuntu)
       Status: In Progress => Fix Released

** Changed in: linux (Ubuntu)
   Importance: High => Undecided

** Description changed:

  [Impact]
  
  When using KVM on NUMA machines, both Linux and Windows guests can
  exhibit very poor performance and potential crashes. Disabling KSM is a
  known workaround to fix this issue.
  
  [Fix]
  
  The following patch fixes the issue in our testing:
  http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=64a9a34e22896dad430e21a28ad8cb00a756fefc
  
  This patch is present in v3.14-rc1 and onwards.
  
  [Test Case]
  
  General test case:
  1) On a NUMA capable machine, setup the machine as a KVM hypervisor
    - lscpu should show more than 1 NUMA node
  2) Install 4 KVM VMs
  3) Run the following in another terminal to ensure that pages_shared and pages_sharing is increasing
   - watch 'tail /sys/kernel/mm/ksm/*'
  4) In another terminal run a program that continually pings each node and alerts on high latencies
  
  What we've observed is that in Linux guests, the ping latencies can go
  into the ~2 second range for a few pings, then return back to the < 1ms
- range. (This is machine dependent.) In addition, using Windows guests,
- occasionally when running this test we observe that the guests BSOD
- during this test.
+ range. (This is machine dependent.) In addition, occasionally when
+ running this test with Windows guests we observe BSODs during this test.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1346917

Title:
  Using KSM on NUMA capable machines can cause KVM guest performance and
  stability issues

Status in “linux” package in Ubuntu:
  Fix Released
Status in “linux” source package in Trusty:
  In Progress

Bug description:
  [Impact]

  When using KVM on NUMA machines, both Linux and Windows guests can
  exhibit very poor performance and potential crashes. Disabling KSM is
  a known workaround to fix this issue.

  [Fix]

  The following patch fixes the issue in our testing:
  http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=64a9a34e22896dad430e21a28ad8cb00a756fefc

  This patch is present in v3.14-rc1 and onwards.

  [Test Case]

  General test case:
  1) On a NUMA capable machine, setup the machine as a KVM hypervisor
    - lscpu should show more than 1 NUMA node
  2) Install 4 KVM VMs
  3) Run the following in another terminal to ensure that pages_shared and pages_sharing is increasing
   - watch 'tail /sys/kernel/mm/ksm/*'
  4) In another terminal run a program that continually pings each node and alerts on high latencies

  What we've observed is that in Linux guests, the ping latencies can go
  into the ~2 second range for a few pings, then return back to the <
  1ms range. (This is machine dependent.) In addition, occasionally when
  running this test with Windows guests we observe BSODs during this
  test.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1346917/+subscriptions


Follow ups

References