yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #76655
[Bug 1810977] Re: Oversubscription broken for instances with NUMA topologies
Reviewed: https://review.openstack.org/629281
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b24ad3780bc872d1a17907909cd6bcbea7e804b3
Submitter: Zuul
Branch: master
commit b24ad3780bc872d1a17907909cd6bcbea7e804b3
Author: Stephen Finucane <sfinucan@xxxxxxxxxx>
Date: Tue Jan 8 17:01:41 2019 +0000
Fix overcommit for NUMA-based instances
Change I5f5c621f2f0fa1bc18ee9a97d17085107a5dee53 modified how we
evaluated available memory for instances with a NUMA topology.
Previously, we used a non-pagesize aware check unless the user had
explicitly requested a specific pagesize. This means that for instances
without pagesize requests, nova considers hugepages as available memory
when deciding if a host has enough available memory for the instance.
The aforementioned change modified this so that all NUMA-based
instances, whether they had hugepages or not, would use the
pagesize-aware check. Unfortunately the functionality it was reusing to
do this was functionality previously only used for hugepages. Hugepages
cannot be oversubscribed so we did not take oversubscription into
account, comparing against available memory on the host (i.e. memory not
consumed by other instances) rather than total memory. This is OK when
using hugepages but not small pages, where overcommit is OK.
Given that overcommit is already handled elsewhere in the code, we
simply modify the non-hugepage code path to check for available memory
of the lowest pagesize vs. total memory.
Change-Id: I890b2c81cd49c1c601e9baee6a249709d0f6810e
Signed-off-by: Stephen Finucane <sfinucan@xxxxxxxxxx>
Closes-Bug: #1810977
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1810977
Title:
Oversubscription broken for instances with NUMA topologies
Status in OpenStack Compute (nova):
Fix Released
Bug description:
As described in [1], the fix to [2] appears to have inadvertently
broken oversubscription of memory for instances with a NUMA topology
but no hugepages.
Steps to reproduce:
1. Create a flavor that will consume > 50% available memory for your
host(s) and specify an explicit NUMA topology. For example, on my all-
in-one deployment where the host has 32GB RAM, we will request a 20GB
instance:
$ openstack flavor create --vcpu 2 --disk 0 --ram 20480 test.numa
$ openstack flavor set test.numa --property hw:numa_nodes=2
2. Boot an instance using this flavor:
$ openstack server create --flavor test.numa --image
cirros-0.3.6-x86_64-disk --wait test
3. Boot another instance using this flavor:
$ openstack server create --flavor test.numa --image
cirros-0.3.6-x86_64-disk --wait test2
# Expected result:
The second instance should boot.
# Actual result:
The second instance fails to boot. We see the following error message
in the logs.
nova-scheduler[18295]: DEBUG nova.virt.hardware [None req-f7a6594b-8d25-424c-9c6e-8522f66ffd22 demo admin] No specific pagesize requested for instance, selected pagesize: 4 {{(pid=18318) _numa_fit_instance_cell /opt/stack/nova/nova/virt/hardware.py:1045}}
nova-scheduler[18295]: DEBUG nova.virt.hardware [None req-f7a6594b-8d25-424c-9c6e-8522f66ffd22 demo admin] Not enough available memory to schedule instance with pagesize 4. Required: 10240, available: 5676, total: 15916. {{(pid=18318) _numa_fit_instance_cell /opt/stack/nova/nova/virt/hardware.py:1055}}
If we revert the patch that addressed the bug [3] then we revert to
the correct behaviour and the instance boots. With this though, we
obviously lose whatever benefits that change gave us.
[1] http://lists.openstack.org/pipermail/openstack-discuss/2019-January/001459.html
[2] https://bugs.launchpad.net/nova/+bug/1734204
[3] https://review.openstack.org/#/c/532168
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1810977/+subscriptions
References