yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #69947
[Bug 1738501] [NEW] Nova scheduler randomly fails to schedule CPU-pinned instance-flavors with hugepages - fails increases as running instance count grows
Public bug reported:
Description
===========
Isolated to single hypervisor.
Nova scheduler randomly fails to schedule CPU-pinned instance-flavors with hugepages - fails increases as running instance count grows.
Steps to reproduce
==================
1) Hypervisor with two numa-nodes, 2x Intel Gold 6126, 256GB RAM (128GB
in each numa node), 61440x2M hugepages in each node. Hypervisor running
nothing else than OpenStack
2) Flavor specified with:
- 4 vCPUs
- 20480 MB RAM
- hw:cpu_policy dedicated
- hw:cpu_thread_policy require
- hw:mem_page_size 2MB
3) Try to schedule 12 instances of the mentioned flavor
Expected result
===============
12 instances running on hypervisor, neatly packed using up all
hugepages.
Actual result
=============
NUMA node 0 is full, NUMA node 1 has 2-3 instances or so. This varies
from attempt to attempt.
Workaround
==========
Leave all running instances as they are, schedule more instances until
the desired amount of instances have been successfully created. (It took
32 create attempts to fill all 12 slots for me)
Problem will not exist if hugepages are disabled from flavor and
hypervisor.
Environment
===========
Running OpenStack Ocata, RDO packages on Centos 7.4.
Linux 3.10.0-514.10.2.el7.x86_64
nova 15.0.7
Compute:
openstack-nova-compute-15.0.7-1.el7.noarch
Ctrl:
openstack-nova-conductor-15.0.7-1.el7.noarch
python2-novaclient-7.1.2-1.el7.noarch
python-nova-15.0.7-1.el7.noarch
openstack-nova-novncproxy-15.0.7-1.el7.noarch
openstack-nova-placement-api-15.0.7-1.el7.noarch
openstack-nova-common-15.0.7-1.el7.noarch
openstack-nova-api-15.0.7-1.el7.noarch
openstack-nova-scheduler-15.0.7-1.el7.noarch
openstack-nova-console-15.0.7-1.el7.noarch
Using Libvirt+KVM
libvirt 3.2.0-14.el7_4 (ev)
qemu 2.9.0-16.el7_4 (ev)
Storage is pure qcow2 on /var/lib/nova
Neutron with linuxbridge-agent for networking.
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1738501
Title:
Nova scheduler randomly fails to schedule CPU-pinned instance-flavors
with hugepages - fails increases as running instance count grows
Status in OpenStack Compute (nova):
New
Bug description:
Description
===========
Isolated to single hypervisor.
Nova scheduler randomly fails to schedule CPU-pinned instance-flavors with hugepages - fails increases as running instance count grows.
Steps to reproduce
==================
1) Hypervisor with two numa-nodes, 2x Intel Gold 6126, 256GB RAM
(128GB in each numa node), 61440x2M hugepages in each node.
Hypervisor running nothing else than OpenStack
2) Flavor specified with:
- 4 vCPUs
- 20480 MB RAM
- hw:cpu_policy dedicated
- hw:cpu_thread_policy require
- hw:mem_page_size 2MB
3) Try to schedule 12 instances of the mentioned flavor
Expected result
===============
12 instances running on hypervisor, neatly packed using up all
hugepages.
Actual result
=============
NUMA node 0 is full, NUMA node 1 has 2-3 instances or so. This varies
from attempt to attempt.
Workaround
==========
Leave all running instances as they are, schedule more instances until
the desired amount of instances have been successfully created. (It
took 32 create attempts to fill all 12 slots for me)
Problem will not exist if hugepages are disabled from flavor and
hypervisor.
Environment
===========
Running OpenStack Ocata, RDO packages on Centos 7.4.
Linux 3.10.0-514.10.2.el7.x86_64
nova 15.0.7
Compute:
openstack-nova-compute-15.0.7-1.el7.noarch
Ctrl:
openstack-nova-conductor-15.0.7-1.el7.noarch
python2-novaclient-7.1.2-1.el7.noarch
python-nova-15.0.7-1.el7.noarch
openstack-nova-novncproxy-15.0.7-1.el7.noarch
openstack-nova-placement-api-15.0.7-1.el7.noarch
openstack-nova-common-15.0.7-1.el7.noarch
openstack-nova-api-15.0.7-1.el7.noarch
openstack-nova-scheduler-15.0.7-1.el7.noarch
openstack-nova-console-15.0.7-1.el7.noarch
Using Libvirt+KVM
libvirt 3.2.0-14.el7_4 (ev)
qemu 2.9.0-16.el7_4 (ev)
Storage is pure qcow2 on /var/lib/nova
Neutron with linuxbridge-agent for networking.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1738501/+subscriptions