yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #76624
[Bug 1811886] [NEW] Overcommit allowed for pinned instances when using hugepages
Public bug reported:
When working on a fix for bug 181097, it was noted that the check to
ensure pinned instances do not overcommit was not pagesize aware. This
means if an instance without hugepages boots on a host with a large
number of hugepages allocated, it may not get all of the memory
allocated to it. The solution seems to be to make the check pagesize
aware. Test cases to prove this is the case are provided below.
---
# Host information
The memory capacity (and some other stuff) for our node:
$ virsh capabilities | xmllint --xpath '/capabilities/host/topology/cells' -
<cells num="2">
<cell id="0">
<memory unit="KiB">16298528</memory>
<pages unit="KiB" size="4">3075208</pages>
<pages unit="KiB" size="2048">4000</pages>
<pages unit="KiB" size="1048576">0</pages>
...
</cell>
<cell id="1">
<memory unit="KiB">16512884</memory>
<pages unit="KiB" size="4">3128797</pages>
<pages unit="KiB" size="2048">4000</pages>
<pages unit="KiB" size="1048576">0</pages>
...
</cell>
</cells>
Clearly there are not 3075208 and 3128797 4k pages on NUMA nodes 0 and 1,
respectively, since, for NUMA node 0, (3075208 * 4) + (4000 * 2048) != 16298528.
We use [1] to resolve this. Instead we have 16298528 - (4000 * 2048) = 8106528 KiB
memory (or 7.93 GiB) for NUMA cell 0 and something similar for cell 1.
To make things easier, cell 1 is totally disabled by adding the
following to 'nova-cpu.conf':
[DEFAULT]
vcpu_pin_set = 0-5,12-17
[1] https://review.openstack.org/631038
For all test cases I create the flavor then try to create two servers
with the same flavor.
# Test A, unpinned, implicit small pages, oversubscribed.
This should work because we're not using a specific page size.
$ openstack flavor create --vcpu 2 --disk 0 --ram 7168 test.numa
$ openstack flavor set test.numa --property hw:numa_nodes=1
$ openstack server create --flavor test.numa --image cirros-0.3.6-x86_64-disk --wait test1
$ openstack server create --flavor test.numa --image cirros-0.3.6-x86_64-disk --wait test2
Expect: SUCCESS
Actual: SUCCESS
# Test B, unpinned, explicit small pages, oversubscribed
This should fail because we are request a specific page size, though
that size is small pages (4k).
$ openstack flavor create --vcpu 2 --disk 0 --ram 7168 test.numa
$ openstack flavor set test.numa --property hw:numa_nodes=1
$ openstack flavor set test.numa --property hw:mem_page_size=small
$ openstack server create --flavor test.numa --image cirros-0.3.6-x86_64-disk --wait test1
$ openstack server create --flavor test.numa --image cirros-0.3.6-x86_64-disk --wait test2
Expect: FAILURE
Actual: FAILURE
# Test C, pinned, implicit small pages, oversubscribed
This should fail because we don't allow oversubscription with CPU
pinning.
$ openstack flavor create --vcpu 2 --disk 0 --ram 7168 test.pinned
$ openstack flavor set test.pinned --property hw:cpu_policy=dedicated
$ openstack server create --flavor test.pinned --image cirros-0.3.6-x86_64-disk --wait test1
$ openstack server create --flavor test.pinned --image cirros-0.3.6-x86_64-disk --wait test2
Expect: FAILURE
Actual: SUCCESS
Interestingly, this fails on the third VM. This is likely because the total
memory for that cell, 16298528 KiB, is sufficient to handle two instances
but not three.
# Test D, pinned, explicit small pages, oversubscribed
This should fail because we don't allow oversubscription with CPU
pinning.
$ openstack flavor create --vcpu 2 --disk 0 --ram 7168 test.pinned
$ openstack flavor set test.pinned --property hw:cpu_policy=dedicated
$ openstack flavor set test.pinned --property hw:mem_page_size=small
$ openstack server create --flavor test.pinned --image cirros-0.3.6-x86_64-disk --wait test1
$ openstack server create --flavor test.pinned --image cirros-0.3.6-x86_64-disk --wait test2
Expect: FAILURE
Actual: FAILURE
** Affects: nova
Importance: Undecided
Assignee: Stephen Finucane (stephenfinucane)
Status: In Progress
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1811886
Title:
Overcommit allowed for pinned instances when using hugepages
Status in OpenStack Compute (nova):
In Progress
Bug description:
When working on a fix for bug 181097, it was noted that the check to
ensure pinned instances do not overcommit was not pagesize aware. This
means if an instance without hugepages boots on a host with a large
number of hugepages allocated, it may not get all of the memory
allocated to it. The solution seems to be to make the check pagesize
aware. Test cases to prove this is the case are provided below.
---
# Host information
The memory capacity (and some other stuff) for our node:
$ virsh capabilities | xmllint --xpath '/capabilities/host/topology/cells' -
<cells num="2">
<cell id="0">
<memory unit="KiB">16298528</memory>
<pages unit="KiB" size="4">3075208</pages>
<pages unit="KiB" size="2048">4000</pages>
<pages unit="KiB" size="1048576">0</pages>
...
</cell>
<cell id="1">
<memory unit="KiB">16512884</memory>
<pages unit="KiB" size="4">3128797</pages>
<pages unit="KiB" size="2048">4000</pages>
<pages unit="KiB" size="1048576">0</pages>
...
</cell>
</cells>
Clearly there are not 3075208 and 3128797 4k pages on NUMA nodes 0 and 1,
respectively, since, for NUMA node 0, (3075208 * 4) + (4000 * 2048) != 16298528.
We use [1] to resolve this. Instead we have 16298528 - (4000 * 2048) = 8106528 KiB
memory (or 7.93 GiB) for NUMA cell 0 and something similar for cell 1.
To make things easier, cell 1 is totally disabled by adding the
following to 'nova-cpu.conf':
[DEFAULT]
vcpu_pin_set = 0-5,12-17
[1] https://review.openstack.org/631038
For all test cases I create the flavor then try to create two servers
with the same flavor.
# Test A, unpinned, implicit small pages, oversubscribed.
This should work because we're not using a specific page size.
$ openstack flavor create --vcpu 2 --disk 0 --ram 7168 test.numa
$ openstack flavor set test.numa --property hw:numa_nodes=1
$ openstack server create --flavor test.numa --image cirros-0.3.6-x86_64-disk --wait test1
$ openstack server create --flavor test.numa --image cirros-0.3.6-x86_64-disk --wait test2
Expect: SUCCESS
Actual: SUCCESS
# Test B, unpinned, explicit small pages, oversubscribed
This should fail because we are request a specific page size, though
that size is small pages (4k).
$ openstack flavor create --vcpu 2 --disk 0 --ram 7168 test.numa
$ openstack flavor set test.numa --property hw:numa_nodes=1
$ openstack flavor set test.numa --property hw:mem_page_size=small
$ openstack server create --flavor test.numa --image cirros-0.3.6-x86_64-disk --wait test1
$ openstack server create --flavor test.numa --image cirros-0.3.6-x86_64-disk --wait test2
Expect: FAILURE
Actual: FAILURE
# Test C, pinned, implicit small pages, oversubscribed
This should fail because we don't allow oversubscription with CPU
pinning.
$ openstack flavor create --vcpu 2 --disk 0 --ram 7168 test.pinned
$ openstack flavor set test.pinned --property hw:cpu_policy=dedicated
$ openstack server create --flavor test.pinned --image cirros-0.3.6-x86_64-disk --wait test1
$ openstack server create --flavor test.pinned --image cirros-0.3.6-x86_64-disk --wait test2
Expect: FAILURE
Actual: SUCCESS
Interestingly, this fails on the third VM. This is likely because the total
memory for that cell, 16298528 KiB, is sufficient to handle two instances
but not three.
# Test D, pinned, explicit small pages, oversubscribed
This should fail because we don't allow oversubscription with CPU
pinning.
$ openstack flavor create --vcpu 2 --disk 0 --ram 7168 test.pinned
$ openstack flavor set test.pinned --property hw:cpu_policy=dedicated
$ openstack flavor set test.pinned --property hw:mem_page_size=small
$ openstack server create --flavor test.pinned --image cirros-0.3.6-x86_64-disk --wait test1
$ openstack server create --flavor test.pinned --image cirros-0.3.6-x86_64-disk --wait test2
Expect: FAILURE
Actual: FAILURE
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1811886/+subscriptions
Follow ups