← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1950186] [NEW] Nova doesn't account for hugepages when scheduling VMs

 

You have been subscribed to a public bug:

Description
===========

When hugepages are enabled on the host it's possible to schedule VMs
using more RAM than available.

On the node with memory usage presented below it was possible to
schedule 6 instances using a total of 140G of memory and a non-
hugepages-enabled flavor. The same machine has 188G of memory in total,
of which 64G were reserved for hugepages. Additional ~4G were used for
housekeeping, OpenStack control plane, etc. This resulted in
overcommitment of roughly 20G.

After running memory intensive operations on the VMs, some of them got
OOM killed.

$ cat /proc/meminfo  | egrep "^(Mem|Huge)" # on the compute node
MemTotal:       197784792 kB
MemFree:        115005288 kB
MemAvailable:   116745612 kB
HugePages_Total:      64
HugePages_Free:       64
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:    1048576 kB
Hugetlb:        67108864 kB

$ os hypervisor show copmute1 -c memory_mb -c memory_mb_used -c free_ram_mb
+----------------+--------+
| Field          | Value  |
+----------------+--------+
| free_ram_mb    | 29309  |
| memory_mb      | 193149 |
| memory_mb_used | 163840 |
+----------------+--------+

$ os host show compute1
+----------+----------------------------------+-----+-----------+---------+
| Host     | Project                          | CPU | Memory MB | Disk GB |
+----------+----------------------------------+-----+-----------+---------+
| compute1 | (total)                          |   0 |    193149 |     893 |
| compute1 | (used_now)                       |  72 |    163840 |     460 |
| compute1 | (used_max)                       |  72 |    147456 |     460 |
| compute1 | some_project_id_was_here         |   2 |      4096 |      40 |
| compute1 | another_anonymized_id_here       |  70 |    143360 |     420 |
+----------+----------------------------------+-----+-----------+---------+

$ os resource provider inventory list uuid_of_compute1_node
+----------------+------------------+----------+----------+----------+-----------+--------+
| resource_class | allocation_ratio | min_unit | max_unit | reserved | step_size |  total |
+----------------+------------------+----------+----------+----------+-----------+--------+
| MEMORY_MB      |              1.0 |        1 |   193149 |    16384 |         1 | 193149 |
| DISK_GB        |              1.0 |        1 |      893 |        0 |         1 |    893 |
| PCPU           |              1.0 |        1 |       72 |        0 |         1 |     72 |
+----------------+------------------+----------+----------+----------+-----------+--------+

Steps to reproduce
==================

1. Reserve a large part of memory for hugepages on the hypervisor.
2. Create VMs using a flavor that uses a lot of memory that isn't backed by hugepages.
3. Start memory intensive operations on the VMs, e.g.:
stress-ng --vm-bytes $(awk '/MemAvailable/{printf "%d", $2 * 0.98;}' < /proc/meminfo)k --vm-keep -m 1

Expected result
===============

Nova should not allow overcommitment and should be able to differentiate
between hugepages and "normal" memory.

Actual result
=============
Overcommitment resulting in OOM kills.

Environment
===========
nova-api-metadata 2:21.2.1-0ubuntu1~cloud0
nova-common 2:21.2.1-0ubuntu1~cloud0
nova-compute 2:21.2.1-0ubuntu1~cloud0
nova-compute-kvm 2:21.2.1-0ubuntu1~cloud0
nova-compute-libvirt 2:21.2.1-0ubuntu1~cloud0
python3-nova 2:21.2.1-0ubuntu1~cloud0
python3-novaclient 2:17.0.0-0ubuntu1~cloud0

OS: Ubuntu 18.04.5 LTS
Hypervisor: libvirt + KVM

** Affects: nova
     Importance: Undecided
         Status: Confirmed


** Tags: sts
-- 
Nova doesn't account for hugepages when scheduling VMs
https://bugs.launchpad.net/bugs/1950186
You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova).


References