← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1738501] [NEW] Nova scheduler randomly fails to schedule CPU-pinned instance-flavors with hugepages - fails increases as running instance count grows

 

Public bug reported:

Description
===========
Isolated to single hypervisor.
Nova scheduler randomly fails to schedule CPU-pinned instance-flavors with hugepages - fails increases as running instance count grows.

Steps to reproduce
==================

1) Hypervisor with two numa-nodes, 2x Intel Gold 6126, 256GB RAM (128GB
in each numa node), 61440x2M hugepages in each node.  Hypervisor running
nothing else than OpenStack

2) Flavor specified with:
 - 4 vCPUs
 - 20480 MB RAM
 - hw:cpu_policy dedicated
 - hw:cpu_thread_policy require
 - hw:mem_page_size 2MB

3) Try to schedule 12 instances of the mentioned flavor


Expected result
===============

12 instances running on hypervisor, neatly packed using up all
hugepages.


Actual result
=============

NUMA node 0 is full, NUMA node 1 has 2-3 instances or so.  This varies
from attempt to attempt.


Workaround
==========

Leave all running instances as they are, schedule more instances until
the desired amount of instances have been successfully created. (It took
32 create attempts to fill all 12 slots for me)

Problem will not exist if hugepages are disabled from flavor and
hypervisor.


Environment
===========
Running OpenStack Ocata, RDO packages on Centos 7.4.
Linux 3.10.0-514.10.2.el7.x86_64
nova 15.0.7

Compute:
openstack-nova-compute-15.0.7-1.el7.noarch

Ctrl:
openstack-nova-conductor-15.0.7-1.el7.noarch
python2-novaclient-7.1.2-1.el7.noarch
python-nova-15.0.7-1.el7.noarch
openstack-nova-novncproxy-15.0.7-1.el7.noarch
openstack-nova-placement-api-15.0.7-1.el7.noarch
openstack-nova-common-15.0.7-1.el7.noarch
openstack-nova-api-15.0.7-1.el7.noarch
openstack-nova-scheduler-15.0.7-1.el7.noarch
openstack-nova-console-15.0.7-1.el7.noarch


Using Libvirt+KVM

libvirt 3.2.0-14.el7_4 (ev)
qemu 2.9.0-16.el7_4 (ev)


Storage is pure qcow2 on /var/lib/nova

Neutron with linuxbridge-agent for networking.

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1738501

Title:
  Nova scheduler randomly fails to schedule CPU-pinned instance-flavors
  with hugepages - fails increases as running instance count grows

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===========
  Isolated to single hypervisor.
  Nova scheduler randomly fails to schedule CPU-pinned instance-flavors with hugepages - fails increases as running instance count grows.

  Steps to reproduce
  ==================

  1) Hypervisor with two numa-nodes, 2x Intel Gold 6126, 256GB RAM
  (128GB in each numa node), 61440x2M hugepages in each node.
  Hypervisor running nothing else than OpenStack

  2) Flavor specified with:
   - 4 vCPUs
   - 20480 MB RAM
   - hw:cpu_policy dedicated
   - hw:cpu_thread_policy require
   - hw:mem_page_size 2MB

  3) Try to schedule 12 instances of the mentioned flavor

  
  Expected result
  ===============

  12 instances running on hypervisor, neatly packed using up all
  hugepages.

  
  Actual result
  =============

  NUMA node 0 is full, NUMA node 1 has 2-3 instances or so.  This varies
  from attempt to attempt.

  
  Workaround
  ==========

  Leave all running instances as they are, schedule more instances until
  the desired amount of instances have been successfully created. (It
  took 32 create attempts to fill all 12 slots for me)

  Problem will not exist if hugepages are disabled from flavor and
  hypervisor.


  Environment
  ===========
  Running OpenStack Ocata, RDO packages on Centos 7.4.
  Linux 3.10.0-514.10.2.el7.x86_64
  nova 15.0.7

  Compute:
  openstack-nova-compute-15.0.7-1.el7.noarch

  Ctrl:
  openstack-nova-conductor-15.0.7-1.el7.noarch
  python2-novaclient-7.1.2-1.el7.noarch
  python-nova-15.0.7-1.el7.noarch
  openstack-nova-novncproxy-15.0.7-1.el7.noarch
  openstack-nova-placement-api-15.0.7-1.el7.noarch
  openstack-nova-common-15.0.7-1.el7.noarch
  openstack-nova-api-15.0.7-1.el7.noarch
  openstack-nova-scheduler-15.0.7-1.el7.noarch
  openstack-nova-console-15.0.7-1.el7.noarch

  
  Using Libvirt+KVM

  libvirt 3.2.0-14.el7_4 (ev)
  qemu 2.9.0-16.el7_4 (ev)

  
  Storage is pure qcow2 on /var/lib/nova

  Neutron with linuxbridge-agent for networking.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1738501/+subscriptions