← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1950186] Re: Nova doesn't account for hugepages when scheduling VMs

 

This can be reproduced on Focal/ussuri:

############ Computes:
$ os resource provider list
+--------------------------------------+-----------------------------------------------------+------------+--------------------------------------+----------------------+
| uuid                                 | name                                                | generation | root_provider_uuid                   | parent_provider_uuid |
+--------------------------------------+-----------------------------------------------------+------------+--------------------------------------+----------------------+
| ca3fa736-7e60-4365-9cc8-7afc78b53005 | juju-98fb61-zaza-d6f2c7825043-9.project.serverstack |          5 | ca3fa736-7e60-4365-9cc8-7afc78b53005 | None                 |
| 0605bd29-71d5-40ed-ab8f-eceeaaac59b5 | juju-98fb61-zaza-d6f2c7825043-8.project.serverstack |          4 | 0605bd29-71d5-40ed-ab8f-eceeaaac59b5 | None                 |
+--------------------------------------+-----------------------------------------------------+------------+--------------------------------------+----------------------+


############ Mem Allocation ratio is 1:
$ openstack resource provider inventory list ca3fa736-7e60-4365-9cc8-7afc78b53005
+----------------+------------------+----------+----------+----------+-----------+-------+-------+
| resource_class | allocation_ratio | min_unit | max_unit | reserved | step_size | total |  used |
+----------------+------------------+----------+----------+----------+-----------+-------+-------+
| VCPU           |             16.0 |        1 |        8 |        0 |         1 |     8 |     2 |
| MEMORY_MB      |              1.0 |        1 |    16008 |     2048 |         1 | 16008 | 13960 |
| DISK_GB        |              1.0 |        1 |       77 |        0 |         1 |    77 |    20 |
+----------------+------------------+----------+----------+----------+-----------+-------+-------+


$ openstack resource provider inventory list 0605bd29-71d5-40ed-ab8f-eceeaaac59b5
+----------------+------------------+----------+----------+----------+-----------+-------+------+
| resource_class | allocation_ratio | min_unit | max_unit | reserved | step_size | total | used |
+----------------+------------------+----------+----------+----------+-----------+-------+------+
| VCPU           |             16.0 |        1 |        8 |        0 |         1 |     8 |    0 |
| MEMORY_MB      |              1.0 |        1 |    16008 |     2048 |         1 | 16008 |    0 |
| DISK_GB        |              1.0 |        1 |       77 |        0 |         1 |    77 |    0 |
+----------------+------------------+----------+----------+----------+-----------+-------+------+

######## Hugepages: 1000 * 2M
root@juju-98fb61-zaza-d6f2c7825043-9:~# cat /proc/meminfo | grep -i huge
AnonHugePages:    622592 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:    1000
HugePages_Free:     1000
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:         2048000 kB


root@juju-98fb61-zaza-d6f2c7825043-9:~# free -mh
              total        used        free      shared  buff/cache   available
Mem:           15Gi       3.5Gi        11Gi       1.0Mi       713Mi        11Gi
Swap:            0B          0B          0B


######## Host reserved memory is 2G:
$ juju config nova-compute reserved-host-memory
2048


######## Available Mem for general use (not hugepage)
I expect to have available memory for VMs = 16008 (total) - 2048 (reserved) - 2048 (hugepages) = 11912


######## Flavor with mem 13960 (> of expected total available 11912)
$ os flavor show 14g-mem
+----------------------------+--------------------------------------+
| Field                      | Value                                |
+----------------------------+--------------------------------------+
| OS-FLV-DISABLED:disabled   | False                                |
| OS-FLV-EXT-DATA:ephemeral  | 0                                    |
| access_project_ids         | None                                 |
| description                | None                                 |
| disk                       | 20                                   |
| id                         | 377de58b-7aa2-499d-9940-abf98aaa5a8a |
| name                       | 14g-mem                              |
| os-flavor-access:is_public | True                                 |
| properties                 |                                      |
| ram                        | 13960                                |
| rxtx_factor                | 1.0                                  |
| swap                       |                                      |
| vcpus                      | 2                                    |
+----------------------------+--------------------------------------+


###### VM with flavor 14g-mem is scheduled correctly (Expected No Valid host)
$ os server list -c ID -c Name -c Status -c "Flavor"
+--------------------------------------+---------+--------+---------+
| ID                                   | Name    | Status | Flavor  |
+--------------------------------------+---------+--------+---------+
| fd5c8bc9-22a6-4f5e-b745-026fa00e26ea | test-vm | ACTIVE | 14g-mem |
+--------------------------------------+---------+--------+---------+

** Project changed: nova => ubuntu

** Package changed: ubuntu => nova (Ubuntu)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1950186

Title:
  Nova doesn't account for hugepages when scheduling VMs

Status in nova package in Ubuntu:
  New

Bug description:
  Description
  ===========

  When hugepages are enabled on the host it's possible to schedule VMs
  using more RAM than available.

  On the node with memory usage presented below it was possible to
  schedule 6 instances using a total of 140G of memory and a non-
  hugepages-enabled flavor. The same machine has 188G of memory in
  total, of which 64G were reserved for hugepages. Additional ~4G were
  used for housekeeping, OpenStack control plane, etc. This resulted in
  overcommitment of roughly 20G.

  After running memory intensive operations on the VMs, some of them got
  OOM killed.

  $ cat /proc/meminfo  | egrep "^(Mem|Huge)" # on the compute node
  MemTotal:       197784792 kB
  MemFree:        115005288 kB
  MemAvailable:   116745612 kB
  HugePages_Total:      64
  HugePages_Free:       64
  HugePages_Rsvd:        0
  HugePages_Surp:        0
  Hugepagesize:    1048576 kB
  Hugetlb:        67108864 kB

  $ os hypervisor show copmute1 -c memory_mb -c memory_mb_used -c free_ram_mb
  +----------------+--------+
  | Field          | Value  |
  +----------------+--------+
  | free_ram_mb    | 29309  |
  | memory_mb      | 193149 |
  | memory_mb_used | 163840 |
  +----------------+--------+

  $ os host show compute1
  +----------+----------------------------------+-----+-----------+---------+
  | Host     | Project                          | CPU | Memory MB | Disk GB |
  +----------+----------------------------------+-----+-----------+---------+
  | compute1 | (total)                          |   0 |    193149 |     893 |
  | compute1 | (used_now)                       |  72 |    163840 |     460 |
  | compute1 | (used_max)                       |  72 |    147456 |     460 |
  | compute1 | some_project_id_was_here         |   2 |      4096 |      40 |
  | compute1 | another_anonymized_id_here       |  70 |    143360 |     420 |
  +----------+----------------------------------+-----+-----------+---------+

  $ os resource provider inventory list uuid_of_compute1_node
  +----------------+------------------+----------+----------+----------+-----------+--------+
  | resource_class | allocation_ratio | min_unit | max_unit | reserved | step_size |  total |
  +----------------+------------------+----------+----------+----------+-----------+--------+
  | MEMORY_MB      |              1.0 |        1 |   193149 |    16384 |         1 | 193149 |
  | DISK_GB        |              1.0 |        1 |      893 |        0 |         1 |    893 |
  | PCPU           |              1.0 |        1 |       72 |        0 |         1 |     72 |
  +----------------+------------------+----------+----------+----------+-----------+--------+

  Steps to reproduce
  ==================

  1. Reserve a large part of memory for hugepages on the hypervisor.
  2. Create VMs using a flavor that uses a lot of memory that isn't backed by hugepages.
  3. Start memory intensive operations on the VMs, e.g.:
  stress-ng --vm-bytes $(awk '/MemAvailable/{printf "%d", $2 * 0.98;}' < /proc/meminfo)k --vm-keep -m 1

  Expected result
  ===============

  Nova should not allow overcommitment and should be able to
  differentiate between hugepages and "normal" memory.

  Actual result
  =============
  Overcommitment resulting in OOM kills.

  Environment
  ===========
  nova-api-metadata 2:21.2.1-0ubuntu1~cloud0
  nova-common 2:21.2.1-0ubuntu1~cloud0
  nova-compute 2:21.2.1-0ubuntu1~cloud0
  nova-compute-kvm 2:21.2.1-0ubuntu1~cloud0
  nova-compute-libvirt 2:21.2.1-0ubuntu1~cloud0
  python3-nova 2:21.2.1-0ubuntu1~cloud0
  python3-novaclient 2:17.0.0-0ubuntu1~cloud0

  OS: Ubuntu 18.04.5 LTS
  Hypervisor: libvirt + KVM

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nova/+bug/1950186/+subscriptions



References