← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1810977] Re: Oversubscription broken for instances with NUMA topologies

 

** Also affects: nova/rocky
   Importance: Undecided
       Status: New

** Changed in: nova/rocky
       Status: New => In Progress

** Changed in: nova/rocky
   Importance: Undecided => Medium

** Changed in: nova/rocky
     Assignee: (unassigned) => Stephen Finucane (stephenfinucane)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1810977

Title:
  Oversubscription broken for instances with NUMA topologies

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) rocky series:
  In Progress

Bug description:
  As described in [1], the fix to [2] appears to have inadvertently
  broken oversubscription of memory for instances with a NUMA topology
  but no hugepages.

  Steps to reproduce:

  1. Create a flavor that will consume > 50% available memory for your
  host(s) and specify an explicit NUMA topology. For example, on my all-
  in-one deployment where the host has 32GB RAM, we will request a 20GB
  instance:

     $ openstack flavor create --vcpu 2 --disk 0 --ram 20480 test.numa
     $ openstack flavor set test.numa --property hw:numa_nodes=2

  2. Boot an instance using this flavor:

     $ openstack server create --flavor test.numa --image
  cirros-0.3.6-x86_64-disk --wait test

  3. Boot another instance using this flavor:

     $ openstack server create --flavor test.numa --image
  cirros-0.3.6-x86_64-disk --wait test2

  # Expected result:

  The second instance should boot.

  # Actual result:

  The second instance fails to boot. We see the following error message
  in the logs.

    nova-scheduler[18295]: DEBUG nova.virt.hardware [None req-f7a6594b-8d25-424c-9c6e-8522f66ffd22 demo admin] No specific pagesize requested for instance, selected pagesize: 4 {{(pid=18318) _numa_fit_instance_cell /opt/stack/nova/nova/virt/hardware.py:1045}}
    nova-scheduler[18295]: DEBUG nova.virt.hardware [None req-f7a6594b-8d25-424c-9c6e-8522f66ffd22 demo admin] Not enough available memory to schedule instance with pagesize 4. Required: 10240, available: 5676, total: 15916. {{(pid=18318) _numa_fit_instance_cell /opt/stack/nova/nova/virt/hardware.py:1055}}

  If we revert the patch that addressed the bug [3] then we revert to
  the correct behaviour and the instance boots. With this though, we
  obviously lose whatever benefits that change gave us.

  [1] http://lists.openstack.org/pipermail/openstack-discuss/2019-January/001459.html
  [2] https://bugs.launchpad.net/nova/+bug/1734204
  [3] https://review.openstack.org/#/c/532168

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1810977/+subscriptions


References