yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #55822
[Bug 1619050] [NEW] Nova scheduler is too greedy when loading instances of hypervisors
Public bug reported:
Nova scheduler has the ability to track instances per hypervisor:
https://github.com/openstack/nova/commit/82cc056fb7e1b081a733797ed27550343cbaf44c
There is a bug in the logic used to load the instance infos per
hypervisor. This specific line [1] found in the "_add_instance_info"
function uses "objects.InstanceList.get_by_host(context, host_name)" to
load instances. This means ALL instances managed by that single nova-
compute host will be loaded in memory.
This logic is faulty because the "get_all_host_states" function loops on
all compute nodes (hypervisors) and call _add_instance_info for each of
them. This means the "_add_instance_info" function should be loading
instances for a specific host AND hypervisor.
"objects.InstanceList.get_by_host_and_node(context, host_name,
compute.hypervisor_hostname)" should be used instead to limit the scope
of loaded instances to the specific host/hypervisor tuple.
If you run Nova in the Ironic context or anything where a single nova-
compute host can manage LOT of hypervisors, this means you could load a
LOT of data in memory and causing an out-of-memory error or serious
performance degradation. For example, if you have 2000 hypervisors
(Ironic nodes), the "_add_instance_info" function will load 2000
instances per hypervisor (instead of 1) found in get_all_host_states,
ending with an overall process loading 2000^2 rows from the database.
[1]
https://github.com/openstack/nova/blob/dd44096a04a85319481943c1b2bb2471e752b0b3/nova/scheduler/host_manager.py#L631
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1619050
Title:
Nova scheduler is too greedy when loading instances of hypervisors
Status in OpenStack Compute (nova):
New
Bug description:
Nova scheduler has the ability to track instances per hypervisor:
https://github.com/openstack/nova/commit/82cc056fb7e1b081a733797ed27550343cbaf44c
There is a bug in the logic used to load the instance infos per
hypervisor. This specific line [1] found in the "_add_instance_info"
function uses "objects.InstanceList.get_by_host(context, host_name)"
to load instances. This means ALL instances managed by that single
nova-compute host will be loaded in memory.
This logic is faulty because the "get_all_host_states" function loops
on all compute nodes (hypervisors) and call _add_instance_info for
each of them. This means the "_add_instance_info" function should be
loading instances for a specific host AND hypervisor.
"objects.InstanceList.get_by_host_and_node(context, host_name,
compute.hypervisor_hostname)" should be used instead to limit the
scope of loaded instances to the specific host/hypervisor tuple.
If you run Nova in the Ironic context or anything where a single nova-
compute host can manage LOT of hypervisors, this means you could load
a LOT of data in memory and causing an out-of-memory error or serious
performance degradation. For example, if you have 2000 hypervisors
(Ironic nodes), the "_add_instance_info" function will load 2000
instances per hypervisor (instead of 1) found in get_all_host_states,
ending with an overall process loading 2000^2 rows from the database.
[1]
https://github.com/openstack/nova/blob/dd44096a04a85319481943c1b2bb2471e752b0b3/nova/scheduler/host_manager.py#L631
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1619050/+subscriptions
Follow ups