yahoo-eng-team team mailing list archive

Thread
Date
[Bug 1978372] [NEW] numa_fit_instance_to_host() algorithm is highly ineffective on higher number of NUMA nodes

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Igor Raits <1978372@xxxxxxxxxxxxxxxxxx>
Date: Sat, 11 Jun 2022 07:20:14 -0000
Reply-to: Bug 1978372 <1978372@xxxxxxxxxxxxxxxxxx>
Sender: noreply@xxxxxxxxxxxxx
Public bug reported:

Description
===========
Nova scheduler, when numa_fit_instance_to_host() is executed for instance with 8 NUMA nodes against host object with NUMA topology that includes 16 NUMA nodes (3 cores × 2 threads each) is taking ~5 minutes when first half of NUMA nodes are occupied.

This makes scheduling 48 cores flavor extremely sloooow…

Output of reproducer:
```
InstanceNUMATopology(cells=[InstanceNUMACell(8),InstanceNUMACell(9),InstanceNUMACell(10),InstanceNUMACell(11),InstanceNUMACell(12),InstanceNUMACell(13),InstanceNUMACell(14)],emulator_threads_policy=None,id=<?>,instance_uuid=<?>)

________________________________________________________
Executed in  269.13 secs    fish           external
   usr time  268.60 secs    0.00 micros  268.60 secs
   sys time    0.07 secs  595.00 micros    0.07 secs
```

Steps to reproduce
==================
1. Add host with 16 NUMA nodes (3 cores × 2 threads each) to the OpenStack
2. Create a flavor for 48 CPUs that would take half of the host exactly
openstack flavor create sh4a-c48r488e20 \
--ram $((488*1024)) \
--vcpus 48 \
--ephemeral 20 \
--disk 20 \
--swap 0 \
--property 'hw:mem_page_size=1GB' \
--property 'hw:cpu_policy=dedicated' \
--property 'hw:cpu_thread_policy=prefer' \
--property 'hw:cpu_max_sockets=8' \
--property 'hw:cpu_sockets=8' \
--property 'hw:numa_mempolicy=strict' \
--property 'hw:numa_nodes=8' \
--property 'hw:numa_cpus.0=0,1,2,3,4,5' \
--property 'hw:numa_cpus.1=6,7,8,9,10,11' \
--property 'hw:numa_cpus.2=12,13,14,15,16,17' \
--property 'hw:numa_cpus.3=18,19,20,21,22,23' \
--property 'hw:numa_cpus.4=24,25,26,27,28,29' \
--property 'hw:numa_cpus.5=30,31,32,33,34,35' \
--property 'hw:numa_cpus.6=36,37,38,39,40,41' \
--property 'hw:numa_cpus.7=42,43,44,45,46,47' \
--property 'hw:numa_mem.0=62464' \
--property 'hw:numa_mem.1=62464' \
--property 'hw:numa_mem.2=62464' \
--property 'hw:numa_mem.3=62464' \
--property 'hw:numa_mem.4=62464' \
--property 'hw:numa_mem.5=62464' \
--property 'hw:numa_mem.6=62464' \
--property 'hw:numa_mem.7=62464' \
--property 'hw:cpu_threads=2' \
--property 'hw:cpu_max_threads=2'
3. Create an instance with such flavor (so that it would normally land to that host) - command is skipped as in different installation it could be different
4. Wait for the first instance to spawn (this part is fast as it takes first 8 NUMA nodes).
5. Create a second instance with the same flavor.

…

Wait 5+ minutes until nova-scheduler is done with its work.

Expected result
===============
NUMA nodes selected within 10-15 seconds.

Actual result
=============
Algorithm is slow enough so that it takes 5 minutes to have instance scheduled.

Environment
===========
1. OpenStack Nova 23.2.0-1.el8. NOTE: I am able to reproduce this with master branch with 20 lines reproducer.
commit 4939318649650b60dd07d161b80909e70d0e093e (HEAD -> master, upstream/master)
Merge: c6e0f4f551 4c339c10e3
Author: Zuul <zuul@xxxxxxxxxxxxxxxxxx>
Date:   Tue May 17 00:01:41 2022 +0000

    Merge "Drop lower-constraints.txt and its testing"

2. Libvirt + KVM (although it is not relevant here)
libvirt-8.0.0-6.module_el8.7.0+1140+ff0772f9.x86_64
qemu-kvm-6.2.0-12.module_el8.7.0+1140+ff0772f9.x86_64

2. LVM storage (not relevant either)
lvm2-2.03.14-3.el8.x86_64

3. Neutron with L2 (not relevant)

Logs & Configs
==============
Check the reproducer and try it with uncommented DEBUG lines (will attach it here too).

** Affects: nova
     Importance: Undecided
         Status: New

** Attachment added: "reproducer-simplified.py"
   https://bugs.launchpad.net/bugs/1978372/+attachment/5596697/+files/t.py

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1978372

Title:
  numa_fit_instance_to_host() algorithm is highly ineffective on higher
  number of NUMA nodes

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===========
  Nova scheduler, when numa_fit_instance_to_host() is executed for instance with 8 NUMA nodes against host object with NUMA topology that includes 16 NUMA nodes (3 cores × 2 threads each) is taking ~5 minutes when first half of NUMA nodes are occupied.

  This makes scheduling 48 cores flavor extremely sloooow…

  Output of reproducer:
  ```
  InstanceNUMATopology(cells=[InstanceNUMACell(8),InstanceNUMACell(9),InstanceNUMACell(10),InstanceNUMACell(11),InstanceNUMACell(12),InstanceNUMACell(13),InstanceNUMACell(14)],emulator_threads_policy=None,id=<?>,instance_uuid=<?>)

  ________________________________________________________
  Executed in  269.13 secs    fish           external
     usr time  268.60 secs    0.00 micros  268.60 secs
     sys time    0.07 secs  595.00 micros    0.07 secs
  ```

  Steps to reproduce
  ==================
  1. Add host with 16 NUMA nodes (3 cores × 2 threads each) to the OpenStack
  2. Create a flavor for 48 CPUs that would take half of the host exactly
  openstack flavor create sh4a-c48r488e20 \
  --ram $((488*1024)) \
  --vcpus 48 \
  --ephemeral 20 \
  --disk 20 \
  --swap 0 \
  --property 'hw:mem_page_size=1GB' \
  --property 'hw:cpu_policy=dedicated' \
  --property 'hw:cpu_thread_policy=prefer' \
  --property 'hw:cpu_max_sockets=8' \
  --property 'hw:cpu_sockets=8' \
  --property 'hw:numa_mempolicy=strict' \
  --property 'hw:numa_nodes=8' \
  --property 'hw:numa_cpus.0=0,1,2,3,4,5' \
  --property 'hw:numa_cpus.1=6,7,8,9,10,11' \
  --property 'hw:numa_cpus.2=12,13,14,15,16,17' \
  --property 'hw:numa_cpus.3=18,19,20,21,22,23' \
  --property 'hw:numa_cpus.4=24,25,26,27,28,29' \
  --property 'hw:numa_cpus.5=30,31,32,33,34,35' \
  --property 'hw:numa_cpus.6=36,37,38,39,40,41' \
  --property 'hw:numa_cpus.7=42,43,44,45,46,47' \
  --property 'hw:numa_mem.0=62464' \
  --property 'hw:numa_mem.1=62464' \
  --property 'hw:numa_mem.2=62464' \
  --property 'hw:numa_mem.3=62464' \
  --property 'hw:numa_mem.4=62464' \
  --property 'hw:numa_mem.5=62464' \
  --property 'hw:numa_mem.6=62464' \
  --property 'hw:numa_mem.7=62464' \
  --property 'hw:cpu_threads=2' \
  --property 'hw:cpu_max_threads=2'
  3. Create an instance with such flavor (so that it would normally land to that host) - command is skipped as in different installation it could be different
  4. Wait for the first instance to spawn (this part is fast as it takes first 8 NUMA nodes).
  5. Create a second instance with the same flavor.

  …

  Wait 5+ minutes until nova-scheduler is done with its work.

  Expected result
  ===============
  NUMA nodes selected within 10-15 seconds.

  Actual result
  =============
  Algorithm is slow enough so that it takes 5 minutes to have instance scheduled.

  Environment
  ===========
  1. OpenStack Nova 23.2.0-1.el8. NOTE: I am able to reproduce this with master branch with 20 lines reproducer.
  commit 4939318649650b60dd07d161b80909e70d0e093e (HEAD -> master, upstream/master)
  Merge: c6e0f4f551 4c339c10e3
  Author: Zuul <zuul@xxxxxxxxxxxxxxxxxx>
  Date:   Tue May 17 00:01:41 2022 +0000

      Merge "Drop lower-constraints.txt and its testing"

  2. Libvirt + KVM (although it is not relevant here)
  libvirt-8.0.0-6.module_el8.7.0+1140+ff0772f9.x86_64
  qemu-kvm-6.2.0-12.module_el8.7.0+1140+ff0772f9.x86_64

  2. LVM storage (not relevant either)
  lvm2-2.03.14-3.el8.x86_64

  3. Neutron with L2 (not relevant)

  Logs & Configs
  ==============
  Check the reproducer and try it with uncommented DEBUG lines (will attach it here too).

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1978372/+subscriptions
Follow ups

[Bug 1978372] Re: numa_fit_instance_to_host() algorithm is highly ineffective on higher number of NUMA nodes
From: OpenStack Infra, 2022-08-11