yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #89487
[Bug 1978372] Re: numa_fit_instance_to_host() algorithm is highly ineffective on higher number of NUMA nodes
Reviewed: https://review.opendev.org/c/openstack/nova/+/845896
Committed: https://opendev.org/openstack/nova/commit/099a6f63af7805440d91976ba0ea03bc6c278280
Submitter: "Zuul (22348)"
Branch: master
commit 099a6f63af7805440d91976ba0ea03bc6c278280
Author: Balazs Gibizer <gibi@xxxxxxxxxx>
Date: Wed Jun 15 09:28:27 2022 +0200
Optimize numa_fit_instance_to_host
The numa_fit_instance_to_host algorithm tries all the possible
host cell permutations to fit the instance cells. So in worst case
scenario it does n! / (n-k)! _numa_fit_instance_cell calls
(n=len(host_cells) k=len(instance_cells)) to find if the instance can be
fit to the host. With 16 NUMA nodes host and 8 NUMA node guests this
means 500 million calls to _numa_fit_instance_cell. This takes excessive
time.
However going through these permutations there are many repetitive
host_cell, instance_cell pairs to try to fit.
E.g.
host_cells=[H1, H2, H2]
instance_cells=[G1, G2]
Produces pairings:
* H1 <- G1 and H2 <- G2
* H1 <- G1 and H3 <- G2
...
Here G1 is checked to fit H1 twice. But if it does not fit in the first
time then we know that it will not fit in the second time either. So we
can cache the result of the first check and use that cache for the later
permutations.
This patch adds two caches to the algo. A fit_cache to hold
host_cell.id, instance_cell.id pairs that we know fit, and a
no_fit_cache for those pairs that we already know that doesn't fit.
This change significantly boost the performance of the algorithm. The
reproduction provided in the bug 1978372 took 6 minutes on my local
machine to run without the optimization. With the optimization it run in
3 seconds.
This change increase the memory usage of the algorithm with the two
caches. Those caches are sets of integer two tuples. And the total size
of the cache is the total number of possible host_cell, instance_cell
pairs which is len(host_cell) * len(instance_cells). So form the above
example (16 host, 8 instance NUMA) it is 128 pairs of integers in the
cache. That will not cause a significant memory increase.
Closes-Bug: #1978372
Change-Id: Ibcf27d741429a239d13f0404348c61e2668b4ce4
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1978372
Title:
numa_fit_instance_to_host() algorithm is highly ineffective on higher
number of NUMA nodes
Status in OpenStack Compute (nova):
Fix Released
Bug description:
Description
===========
Nova scheduler, when numa_fit_instance_to_host() is executed for instance with 8 NUMA nodes against host object with NUMA topology that includes 16 NUMA nodes (3 cores × 2 threads each) is taking ~5 minutes when first half of NUMA nodes are occupied.
This makes scheduling 48 cores flavor extremely sloooow…
Output of reproducer:
```
InstanceNUMATopology(cells=[InstanceNUMACell(8),InstanceNUMACell(9),InstanceNUMACell(10),InstanceNUMACell(11),InstanceNUMACell(12),InstanceNUMACell(13),InstanceNUMACell(14)],emulator_threads_policy=None,id=<?>,instance_uuid=<?>)
________________________________________________________
Executed in 269.13 secs fish external
usr time 268.60 secs 0.00 micros 268.60 secs
sys time 0.07 secs 595.00 micros 0.07 secs
```
Steps to reproduce
==================
1. Add host with 16 NUMA nodes (3 cores × 2 threads each) to the OpenStack
2. Create a flavor for 48 CPUs that would take half of the host exactly
openstack flavor create sh4a-c48r488e20 \
--ram $((488*1024)) \
--vcpus 48 \
--ephemeral 20 \
--disk 20 \
--swap 0 \
--property 'hw:mem_page_size=1GB' \
--property 'hw:cpu_policy=dedicated' \
--property 'hw:cpu_thread_policy=prefer' \
--property 'hw:cpu_max_sockets=8' \
--property 'hw:cpu_sockets=8' \
--property 'hw:numa_mempolicy=strict' \
--property 'hw:numa_nodes=8' \
--property 'hw:numa_cpus.0=0,1,2,3,4,5' \
--property 'hw:numa_cpus.1=6,7,8,9,10,11' \
--property 'hw:numa_cpus.2=12,13,14,15,16,17' \
--property 'hw:numa_cpus.3=18,19,20,21,22,23' \
--property 'hw:numa_cpus.4=24,25,26,27,28,29' \
--property 'hw:numa_cpus.5=30,31,32,33,34,35' \
--property 'hw:numa_cpus.6=36,37,38,39,40,41' \
--property 'hw:numa_cpus.7=42,43,44,45,46,47' \
--property 'hw:numa_mem.0=62464' \
--property 'hw:numa_mem.1=62464' \
--property 'hw:numa_mem.2=62464' \
--property 'hw:numa_mem.3=62464' \
--property 'hw:numa_mem.4=62464' \
--property 'hw:numa_mem.5=62464' \
--property 'hw:numa_mem.6=62464' \
--property 'hw:numa_mem.7=62464' \
--property 'hw:cpu_threads=2' \
--property 'hw:cpu_max_threads=2'
3. Create an instance with such flavor (so that it would normally land to that host) - command is skipped as in different installation it could be different
4. Wait for the first instance to spawn (this part is fast as it takes first 8 NUMA nodes).
5. Create a second instance with the same flavor.
…
Wait 5+ minutes until nova-scheduler is done with its work.
Expected result
===============
NUMA nodes selected within 10-15 seconds.
Actual result
=============
Algorithm is slow enough so that it takes 5 minutes to have instance scheduled.
Environment
===========
1. OpenStack Nova 23.2.0-1.el8. NOTE: I am able to reproduce this with master branch with 20 lines reproducer.
commit 4939318649650b60dd07d161b80909e70d0e093e (HEAD -> master, upstream/master)
Merge: c6e0f4f551 4c339c10e3
Author: Zuul <zuul@xxxxxxxxxxxxxxxxxx>
Date: Tue May 17 00:01:41 2022 +0000
Merge "Drop lower-constraints.txt and its testing"
2. Libvirt + KVM (although it is not relevant here)
libvirt-8.0.0-6.module_el8.7.0+1140+ff0772f9.x86_64
qemu-kvm-6.2.0-12.module_el8.7.0+1140+ff0772f9.x86_64
2. LVM storage (not relevant either)
lvm2-2.03.14-3.el8.x86_64
3. Neutron with L2 (not relevant)
Logs & Configs
==============
Check the reproducer and try it with uncommented DEBUG lines (will attach it here too).
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1978372/+subscriptions
References