yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #77058
[Bug 1816454] [NEW] hw:mem_page_size is not respecting all documented values
Public bug reported:
Per the Rocky documentation for hugepages:
https://docs.openstack.org/nova/rocky/admin/huge-pages.html
2MB hugepages can be specified either as:
--property hw:mem_page_size=2Mb, or
--property hw:mem_page_size=2048
However, whenever I use the former notation (2Mb), conductor fails with
the misleading NUMA error below... whereas with the latter notation
(2048), allocation succeeds and the resulting instance is backed with
2MB hugepages on an x86_64 platform (as verified by checking
`/proc/meminfo | grep HugePages_Free` before/after stopping the created
instance).
ERROR nova.scheduler.utils [req-de6920d5-829b-411c-acd7-1343f48824c9
cb2abbb91da54209a5ad93a845b4cc26 cb226ff7932d40b0a48ec129e162a2fb -
default default] [instance: 5b53d1d4-6a16-4db9-ab52-b267551c6528] Error
from last host: node1 (node FQDN-REDACTED): ['Traceback (most recent
call last):\n', ' File "/usr/lib/python3/dist-
packages/nova/compute/manager.py", line 2106, in
_build_and_run_instance\n with rt.instance_claim(context, instance,
node, limits):\n', ' File "/usr/lib/python3/dist-
packages/oslo_concurrency/lockutils.py", line 274, in inner\n return
f(*args, **kwargs)\n', ' File "/usr/lib/python3/dist-
packages/nova/compute/resource_tracker.py", line 217, in
instance_claim\n pci_requests, overhead=overhead, limits=limits)\n',
' File "/usr/lib/python3/dist-packages/nova/compute/claims.py", line
95, in __init__\n self._claim_test(resources, limits)\n', ' File
"/usr/lib/python3/dist-packages/nova/compute/claims.py", line 162, in
_claim_test\n "; ".join(reasons))\n',
'nova.exception.ComputeResourcesUnavailable: Insufficient compute
resources: Requested instance NUMA topology cannot fit the given host
NUMA topology.\n', '\nDuring handling of the above exception, another
exception occurred:\n\n', 'Traceback (most recent call last):\n', '
File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line
1940, in _do_build_and_run_instance\n filter_properties,
request_spec)\n', ' File "/usr/lib/python3/dist-
packages/nova/compute/manager.py", line 2156, in
_build_and_run_instance\n instance_uuid=instance.uuid,
reason=e.format_message())\n', 'nova.exception.RescheduledException:
Build of instance 5b53d1d4-6a16-4db9-ab52-b267551c6528 was re-scheduled:
Insufficient compute resources: Requested instance NUMA topology cannot
fit the given host NUMA topology.\n']
Additional info:
I am using Debian testing (buster) and all OpenStack packages included therein.
$ dpkg -l | grep nova
ii nova-common 2:18.1.0-2 all OpenStack Compute - common files
ii nova-compute 2:18.1.0-2 all OpenStack Compute - compute node
ii nova-compute-kvm 2:18.1.0-2 all OpenStack Compute - compute node (KVM)
ii python3-nova 2:18.1.0-2 all OpenStack Compute - libraries
ii python3-novaclient 2:11.0.0-2 all client library for OpenStack Compute API - 3.x
$ dpkg -l | grep qemu
ii ipxe-qemu 1.0.0+git-20161027.b991c67-1 all PXE boot firmware - ROM images for qemu
ii qemu-block-extra:amd64 1:3.1+dfsg-2+b1 amd64 extra block backend modules for qemu-system and qemu-utils
ii qemu-kvm 1:3.1+dfsg-2+b1 amd64 QEMU Full virtualization on x86 hardware
ii qemu-system-common 1:3.1+dfsg-2+b1 amd64 QEMU full system emulation binaries (common files)
ii qemu-system-data 1:3.1+dfsg-2 all QEMU full system emulation (data files)
ii qemu-system-gui 1:3.1+dfsg-2+b1 amd64 QEMU full system emulation binaries (user interface and audio support)
ii qemu-system-x86 1:3.1+dfsg-2+b1 amd64 QEMU full system emulation binaries (x86)
ii qemu-utils 1:3.1+dfsg-2+b1 amd64 QEMU utilities
* I forced nova to allocate on the same hypervisor (node1) when checking
for the issue and can repeatedly allocate using a flavor which specifies
hugepages with hw:mem_page_size=2048 -- on the contrary, when using a
flavor which is otherwise unchanged except for the 2048/2Mb difference,
allocation repeatedly fails.
* I am using libvirt+kvm. I don't think it matters, but I am using Ceph
as a storage backend and neutron in a very basic VLAN-based segmentation
configuration (no OVS or anything remotely fancy).
* I specified hw:numa_nodes='1' when creating the flavor... and all my
hypervisors only have 1 NUMA node, so allocation should always succeed
as long as there are free huge pages (which there are).
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1816454
Title:
hw:mem_page_size is not respecting all documented values
Status in OpenStack Compute (nova):
New
Bug description:
Per the Rocky documentation for hugepages:
https://docs.openstack.org/nova/rocky/admin/huge-pages.html
2MB hugepages can be specified either as:
--property hw:mem_page_size=2Mb, or
--property hw:mem_page_size=2048
However, whenever I use the former notation (2Mb), conductor fails
with the misleading NUMA error below... whereas with the latter
notation (2048), allocation succeeds and the resulting instance is
backed with 2MB hugepages on an x86_64 platform (as verified by
checking `/proc/meminfo | grep HugePages_Free` before/after stopping
the created instance).
ERROR nova.scheduler.utils [req-de6920d5-829b-411c-acd7-1343f48824c9
cb2abbb91da54209a5ad93a845b4cc26 cb226ff7932d40b0a48ec129e162a2fb -
default default] [instance: 5b53d1d4-6a16-4db9-ab52-b267551c6528]
Error from last host: node1 (node FQDN-REDACTED): ['Traceback (most
recent call last):\n', ' File "/usr/lib/python3/dist-
packages/nova/compute/manager.py", line 2106, in
_build_and_run_instance\n with rt.instance_claim(context, instance,
node, limits):\n', ' File "/usr/lib/python3/dist-
packages/oslo_concurrency/lockutils.py", line 274, in inner\n
return f(*args, **kwargs)\n', ' File "/usr/lib/python3/dist-
packages/nova/compute/resource_tracker.py", line 217, in
instance_claim\n pci_requests, overhead=overhead,
limits=limits)\n', ' File "/usr/lib/python3/dist-
packages/nova/compute/claims.py", line 95, in __init__\n
self._claim_test(resources, limits)\n', ' File "/usr/lib/python3
/dist-packages/nova/compute/claims.py", line 162, in _claim_test\n
"; ".join(reasons))\n', 'nova.exception.ComputeResourcesUnavailable:
Insufficient compute resources: Requested instance NUMA topology
cannot fit the given host NUMA topology.\n', '\nDuring handling of the
above exception, another exception occurred:\n\n', 'Traceback (most
recent call last):\n', ' File "/usr/lib/python3/dist-
packages/nova/compute/manager.py", line 1940, in
_do_build_and_run_instance\n filter_properties, request_spec)\n', '
File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line
2156, in _build_and_run_instance\n instance_uuid=instance.uuid,
reason=e.format_message())\n', 'nova.exception.RescheduledException:
Build of instance 5b53d1d4-6a16-4db9-ab52-b267551c6528 was re-
scheduled: Insufficient compute resources: Requested instance NUMA
topology cannot fit the given host NUMA topology.\n']
Additional info:
I am using Debian testing (buster) and all OpenStack packages included therein.
$ dpkg -l | grep nova
ii nova-common 2:18.1.0-2 all OpenStack Compute - common files
ii nova-compute 2:18.1.0-2 all OpenStack Compute - compute node
ii nova-compute-kvm 2:18.1.0-2 all OpenStack Compute - compute node (KVM)
ii python3-nova 2:18.1.0-2 all OpenStack Compute - libraries
ii python3-novaclient 2:11.0.0-2 all client library for OpenStack Compute API - 3.x
$ dpkg -l | grep qemu
ii ipxe-qemu 1.0.0+git-20161027.b991c67-1 all PXE boot firmware - ROM images for qemu
ii qemu-block-extra:amd64 1:3.1+dfsg-2+b1 amd64 extra block backend modules for qemu-system and qemu-utils
ii qemu-kvm 1:3.1+dfsg-2+b1 amd64 QEMU Full virtualization on x86 hardware
ii qemu-system-common 1:3.1+dfsg-2+b1 amd64 QEMU full system emulation binaries (common files)
ii qemu-system-data 1:3.1+dfsg-2 all QEMU full system emulation (data files)
ii qemu-system-gui 1:3.1+dfsg-2+b1 amd64 QEMU full system emulation binaries (user interface and audio support)
ii qemu-system-x86 1:3.1+dfsg-2+b1 amd64 QEMU full system emulation binaries (x86)
ii qemu-utils 1:3.1+dfsg-2+b1 amd64 QEMU utilities
* I forced nova to allocate on the same hypervisor (node1) when
checking for the issue and can repeatedly allocate using a flavor
which specifies hugepages with hw:mem_page_size=2048 -- on the
contrary, when using a flavor which is otherwise unchanged except for
the 2048/2Mb difference, allocation repeatedly fails.
* I am using libvirt+kvm. I don't think it matters, but I am using
Ceph as a storage backend and neutron in a very basic VLAN-based
segmentation configuration (no OVS or anything remotely fancy).
* I specified hw:numa_nodes='1' when creating the flavor... and all my
hypervisors only have 1 NUMA node, so allocation should always succeed
as long as there are free huge pages (which there are).
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1816454/+subscriptions
Follow ups