yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #89084
[Bug 1978372] [NEW] numa_fit_instance_to_host() algorithm is highly ineffective on higher number of NUMA nodes
Public bug reported:
Description
===========
Nova scheduler, when numa_fit_instance_to_host() is executed for instance with 8 NUMA nodes against host object with NUMA topology that includes 16 NUMA nodes (3 cores × 2 threads each) is taking ~5 minutes when first half of NUMA nodes are occupied.
This makes scheduling 48 cores flavor extremely sloooow…
Output of reproducer:
```
InstanceNUMATopology(cells=[InstanceNUMACell(8),InstanceNUMACell(9),InstanceNUMACell(10),InstanceNUMACell(11),InstanceNUMACell(12),InstanceNUMACell(13),InstanceNUMACell(14)],emulator_threads_policy=None,id=<?>,instance_uuid=<?>)
________________________________________________________
Executed in 269.13 secs fish external
usr time 268.60 secs 0.00 micros 268.60 secs
sys time 0.07 secs 595.00 micros 0.07 secs
```
Steps to reproduce
==================
1. Add host with 16 NUMA nodes (3 cores × 2 threads each) to the OpenStack
2. Create a flavor for 48 CPUs that would take half of the host exactly
openstack flavor create sh4a-c48r488e20 \
--ram $((488*1024)) \
--vcpus 48 \
--ephemeral 20 \
--disk 20 \
--swap 0 \
--property 'hw:mem_page_size=1GB' \
--property 'hw:cpu_policy=dedicated' \
--property 'hw:cpu_thread_policy=prefer' \
--property 'hw:cpu_max_sockets=8' \
--property 'hw:cpu_sockets=8' \
--property 'hw:numa_mempolicy=strict' \
--property 'hw:numa_nodes=8' \
--property 'hw:numa_cpus.0=0,1,2,3,4,5' \
--property 'hw:numa_cpus.1=6,7,8,9,10,11' \
--property 'hw:numa_cpus.2=12,13,14,15,16,17' \
--property 'hw:numa_cpus.3=18,19,20,21,22,23' \
--property 'hw:numa_cpus.4=24,25,26,27,28,29' \
--property 'hw:numa_cpus.5=30,31,32,33,34,35' \
--property 'hw:numa_cpus.6=36,37,38,39,40,41' \
--property 'hw:numa_cpus.7=42,43,44,45,46,47' \
--property 'hw:numa_mem.0=62464' \
--property 'hw:numa_mem.1=62464' \
--property 'hw:numa_mem.2=62464' \
--property 'hw:numa_mem.3=62464' \
--property 'hw:numa_mem.4=62464' \
--property 'hw:numa_mem.5=62464' \
--property 'hw:numa_mem.6=62464' \
--property 'hw:numa_mem.7=62464' \
--property 'hw:cpu_threads=2' \
--property 'hw:cpu_max_threads=2'
3. Create an instance with such flavor (so that it would normally land to that host) - command is skipped as in different installation it could be different
4. Wait for the first instance to spawn (this part is fast as it takes first 8 NUMA nodes).
5. Create a second instance with the same flavor.
…
Wait 5+ minutes until nova-scheduler is done with its work.
Expected result
===============
NUMA nodes selected within 10-15 seconds.
Actual result
=============
Algorithm is slow enough so that it takes 5 minutes to have instance scheduled.
Environment
===========
1. OpenStack Nova 23.2.0-1.el8. NOTE: I am able to reproduce this with master branch with 20 lines reproducer.
commit 4939318649650b60dd07d161b80909e70d0e093e (HEAD -> master, upstream/master)
Merge: c6e0f4f551 4c339c10e3
Author: Zuul <zuul@xxxxxxxxxxxxxxxxxx>
Date: Tue May 17 00:01:41 2022 +0000
Merge "Drop lower-constraints.txt and its testing"
2. Libvirt + KVM (although it is not relevant here)
libvirt-8.0.0-6.module_el8.7.0+1140+ff0772f9.x86_64
qemu-kvm-6.2.0-12.module_el8.7.0+1140+ff0772f9.x86_64
2. LVM storage (not relevant either)
lvm2-2.03.14-3.el8.x86_64
3. Neutron with L2 (not relevant)
Logs & Configs
==============
Check the reproducer and try it with uncommented DEBUG lines (will attach it here too).
** Affects: nova
Importance: Undecided
Status: New
** Attachment added: "reproducer-simplified.py"
https://bugs.launchpad.net/bugs/1978372/+attachment/5596697/+files/t.py
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1978372
Title:
numa_fit_instance_to_host() algorithm is highly ineffective on higher
number of NUMA nodes
Status in OpenStack Compute (nova):
New
Bug description:
Description
===========
Nova scheduler, when numa_fit_instance_to_host() is executed for instance with 8 NUMA nodes against host object with NUMA topology that includes 16 NUMA nodes (3 cores × 2 threads each) is taking ~5 minutes when first half of NUMA nodes are occupied.
This makes scheduling 48 cores flavor extremely sloooow…
Output of reproducer:
```
InstanceNUMATopology(cells=[InstanceNUMACell(8),InstanceNUMACell(9),InstanceNUMACell(10),InstanceNUMACell(11),InstanceNUMACell(12),InstanceNUMACell(13),InstanceNUMACell(14)],emulator_threads_policy=None,id=<?>,instance_uuid=<?>)
________________________________________________________
Executed in 269.13 secs fish external
usr time 268.60 secs 0.00 micros 268.60 secs
sys time 0.07 secs 595.00 micros 0.07 secs
```
Steps to reproduce
==================
1. Add host with 16 NUMA nodes (3 cores × 2 threads each) to the OpenStack
2. Create a flavor for 48 CPUs that would take half of the host exactly
openstack flavor create sh4a-c48r488e20 \
--ram $((488*1024)) \
--vcpus 48 \
--ephemeral 20 \
--disk 20 \
--swap 0 \
--property 'hw:mem_page_size=1GB' \
--property 'hw:cpu_policy=dedicated' \
--property 'hw:cpu_thread_policy=prefer' \
--property 'hw:cpu_max_sockets=8' \
--property 'hw:cpu_sockets=8' \
--property 'hw:numa_mempolicy=strict' \
--property 'hw:numa_nodes=8' \
--property 'hw:numa_cpus.0=0,1,2,3,4,5' \
--property 'hw:numa_cpus.1=6,7,8,9,10,11' \
--property 'hw:numa_cpus.2=12,13,14,15,16,17' \
--property 'hw:numa_cpus.3=18,19,20,21,22,23' \
--property 'hw:numa_cpus.4=24,25,26,27,28,29' \
--property 'hw:numa_cpus.5=30,31,32,33,34,35' \
--property 'hw:numa_cpus.6=36,37,38,39,40,41' \
--property 'hw:numa_cpus.7=42,43,44,45,46,47' \
--property 'hw:numa_mem.0=62464' \
--property 'hw:numa_mem.1=62464' \
--property 'hw:numa_mem.2=62464' \
--property 'hw:numa_mem.3=62464' \
--property 'hw:numa_mem.4=62464' \
--property 'hw:numa_mem.5=62464' \
--property 'hw:numa_mem.6=62464' \
--property 'hw:numa_mem.7=62464' \
--property 'hw:cpu_threads=2' \
--property 'hw:cpu_max_threads=2'
3. Create an instance with such flavor (so that it would normally land to that host) - command is skipped as in different installation it could be different
4. Wait for the first instance to spawn (this part is fast as it takes first 8 NUMA nodes).
5. Create a second instance with the same flavor.
…
Wait 5+ minutes until nova-scheduler is done with its work.
Expected result
===============
NUMA nodes selected within 10-15 seconds.
Actual result
=============
Algorithm is slow enough so that it takes 5 minutes to have instance scheduled.
Environment
===========
1. OpenStack Nova 23.2.0-1.el8. NOTE: I am able to reproduce this with master branch with 20 lines reproducer.
commit 4939318649650b60dd07d161b80909e70d0e093e (HEAD -> master, upstream/master)
Merge: c6e0f4f551 4c339c10e3
Author: Zuul <zuul@xxxxxxxxxxxxxxxxxx>
Date: Tue May 17 00:01:41 2022 +0000
Merge "Drop lower-constraints.txt and its testing"
2. Libvirt + KVM (although it is not relevant here)
libvirt-8.0.0-6.module_el8.7.0+1140+ff0772f9.x86_64
qemu-kvm-6.2.0-12.module_el8.7.0+1140+ff0772f9.x86_64
2. LVM storage (not relevant either)
lvm2-2.03.14-3.el8.x86_64
3. Neutron with L2 (not relevant)
Logs & Configs
==============
Check the reproducer and try it with uncommented DEBUG lines (will attach it here too).
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1978372/+subscriptions
Follow ups