← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1810977] Re: Oversubscription broken for instances with NUMA topologies

 

Reviewed:  https://review.openstack.org/629281
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b24ad3780bc872d1a17907909cd6bcbea7e804b3
Submitter: Zuul
Branch:    master

commit b24ad3780bc872d1a17907909cd6bcbea7e804b3
Author: Stephen Finucane <sfinucan@xxxxxxxxxx>
Date:   Tue Jan 8 17:01:41 2019 +0000

    Fix overcommit for NUMA-based instances
    
    Change I5f5c621f2f0fa1bc18ee9a97d17085107a5dee53 modified how we
    evaluated available memory for instances with a NUMA topology.
    Previously, we used a non-pagesize aware check unless the user had
    explicitly requested a specific pagesize. This means that for instances
    without pagesize requests, nova considers hugepages as available memory
    when deciding if a host has enough available memory for the instance.
    
    The aforementioned change modified this so that all NUMA-based
    instances, whether they had hugepages or not, would use the
    pagesize-aware check. Unfortunately the functionality it was reusing to
    do this was functionality previously only used for hugepages. Hugepages
    cannot be oversubscribed so we did not take oversubscription into
    account, comparing against available memory on the host (i.e. memory not
    consumed by other instances) rather than total memory. This is OK when
    using hugepages but not small pages, where overcommit is OK.
    
    Given that overcommit is already handled elsewhere in the code, we
    simply modify the non-hugepage code path to check for available memory
    of the lowest pagesize vs. total memory.
    
    Change-Id: I890b2c81cd49c1c601e9baee6a249709d0f6810e
    Signed-off-by: Stephen Finucane <sfinucan@xxxxxxxxxx>
    Closes-Bug: #1810977


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1810977

Title:
  Oversubscription broken for instances with NUMA topologies

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  As described in [1], the fix to [2] appears to have inadvertently
  broken oversubscription of memory for instances with a NUMA topology
  but no hugepages.

  Steps to reproduce:

  1. Create a flavor that will consume > 50% available memory for your
  host(s) and specify an explicit NUMA topology. For example, on my all-
  in-one deployment where the host has 32GB RAM, we will request a 20GB
  instance:

     $ openstack flavor create --vcpu 2 --disk 0 --ram 20480 test.numa
     $ openstack flavor set test.numa --property hw:numa_nodes=2

  2. Boot an instance using this flavor:

     $ openstack server create --flavor test.numa --image
  cirros-0.3.6-x86_64-disk --wait test

  3. Boot another instance using this flavor:

     $ openstack server create --flavor test.numa --image
  cirros-0.3.6-x86_64-disk --wait test2

  # Expected result:

  The second instance should boot.

  # Actual result:

  The second instance fails to boot. We see the following error message
  in the logs.

    nova-scheduler[18295]: DEBUG nova.virt.hardware [None req-f7a6594b-8d25-424c-9c6e-8522f66ffd22 demo admin] No specific pagesize requested for instance, selected pagesize: 4 {{(pid=18318) _numa_fit_instance_cell /opt/stack/nova/nova/virt/hardware.py:1045}}
    nova-scheduler[18295]: DEBUG nova.virt.hardware [None req-f7a6594b-8d25-424c-9c6e-8522f66ffd22 demo admin] Not enough available memory to schedule instance with pagesize 4. Required: 10240, available: 5676, total: 15916. {{(pid=18318) _numa_fit_instance_cell /opt/stack/nova/nova/virt/hardware.py:1055}}

  If we revert the patch that addressed the bug [3] then we revert to
  the correct behaviour and the instance boots. With this though, we
  obviously lose whatever benefits that change gave us.

  [1] http://lists.openstack.org/pipermail/openstack-discuss/2019-January/001459.html
  [2] https://bugs.launchpad.net/nova/+bug/1734204
  [3] https://review.openstack.org/#/c/532168

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1810977/+subscriptions


References