yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1887377] Re: nova does not loadbalance asignmnet of resources on a host based on avaiablity of pci device, hugepages or pcpus.

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Sylvain Bauza <1887377@xxxxxxxxxxxxxxxxxx>
Date: Wed, 16 Sep 2020 13:35:24 -0000
Reply-to: Bug 1887377 <1887377@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

While I totally understand the use case, I think this is a new feature
for performance reasons and not a bug. CLosing it as Wishlist but of
course you can work on it if you wish ;)

** Changed in: nova
Importance: Undecided => Wishlist

** Changed in: nova
Status: New => Invalid

--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1887377

Title:
nova does not loadbalance asignmnet of resources on a host based on
avaiablity of pci device, hugepages or pcpus.

Status in OpenStack Compute (nova):
Invalid

Bug description:
Nova has supported hugpages, cpu pinning and pci numa affintiy for a very long
time. since its introduction the advice has always been to create a flavor that mimic your
typeical hardware toplogy. i.e. if all your compute host have 2 numa nodes the you should create
flavor that request 2 numa nodes. for along time operators have ignored this advice
and continued to create singel numa node flavor sighting that after 5+ year of hardware venders
working with VNF vendor to make there product numa aware, vnf often still do not optimize
properly for a multi numa environment.

as a result many operator still deploy single numa vms although that
is becoming less common over time. when you deploy a vm with a single
numa node today we more or less iterate over the host numa node in
order and assign the vm to the first numa nodes where it fits. on a
host without any pci devices whitelisted for openstack management this
behvaior result in numa nodes being filled linerally form numa 0 to
numa n. that mean if a host had 100G of hugepage on both numa node 0
and 1 and you schduled 101 1G singel numa vms to the host, 100 vm
would spawn on numa0 and 1 vm would spwan on numa node 1.

that means that the first 100 vms would all contened for cpu resouces
on the first numa node while the last vm had all of the secound numa
ndoe to its own use.

the correct behavior woudl be for nova to round robin asign the vms
attepmetin to keep the resouce avapiableity blanced. this will
maxiumise performance for indivigual vms while pessimisng the
schduling of large vms on a host.

to this end a new numa blancing config option (unset, pack or spread)
should be added and we should sort numa nodes in decending(spread) or
acending(pack) order based on pMEM, pCPUs, mempages and pci devices in
that sequence.

in future release when numa is in placment this sorting will need to
be done in a weigher that sorts the allocation caindiates based on the
same pack/spread cirtira.

i am filing this as a bug not a feature as this will have a
significant impact for existing deployment that either expected
https://specs.openstack.org/openstack/nova-
specs/specs/pike/implemented/reserve-numa-with-pci.html to implement
this logic already or who do not follow our existing guidance on
creating flavor that align to the host topology.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1887377/+subscriptions

References

[Bug 1887377] [NEW] nova does not loadbalance asignmnet of resources on a host based on avaiablity of pci device, hugepages or pcpus.
From: sean mooney, 2020-07-13