← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1619050] [NEW] Nova scheduler is too greedy when loading instances of hypervisors

 

Public bug reported:

Nova scheduler has the ability to track instances per hypervisor:
https://github.com/openstack/nova/commit/82cc056fb7e1b081a733797ed27550343cbaf44c

There is a bug in the logic used to load the instance infos per
hypervisor. This specific line [1] found in the "_add_instance_info"
function uses "objects.InstanceList.get_by_host(context, host_name)" to
load instances. This means ALL instances managed by that single nova-
compute host will be loaded in memory.

This logic is faulty because the "get_all_host_states" function loops on
all compute nodes (hypervisors) and call _add_instance_info for each of
them. This means the "_add_instance_info" function should be loading
instances for a specific host AND hypervisor.
"objects.InstanceList.get_by_host_and_node(context, host_name,
compute.hypervisor_hostname)" should be used instead to limit the scope
of loaded instances to the specific host/hypervisor tuple.

If you run Nova in the Ironic context or anything where a single nova-
compute host can manage LOT of hypervisors, this means you could load a
LOT of data in memory and causing an out-of-memory error or serious
performance degradation. For example, if you have 2000 hypervisors
(Ironic nodes), the "_add_instance_info" function will load 2000
instances per hypervisor (instead of 1) found in get_all_host_states,
ending with an overall process loading 2000^2 rows from the database.

[1]
https://github.com/openstack/nova/blob/dd44096a04a85319481943c1b2bb2471e752b0b3/nova/scheduler/host_manager.py#L631

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1619050

Title:
  Nova scheduler is too greedy when loading instances of hypervisors

Status in OpenStack Compute (nova):
  New

Bug description:
  Nova scheduler has the ability to track instances per hypervisor:
  https://github.com/openstack/nova/commit/82cc056fb7e1b081a733797ed27550343cbaf44c

  There is a bug in the logic used to load the instance infos per
  hypervisor. This specific line [1] found in the "_add_instance_info"
  function uses "objects.InstanceList.get_by_host(context, host_name)"
  to load instances. This means ALL instances managed by that single
  nova-compute host will be loaded in memory.

  This logic is faulty because the "get_all_host_states" function loops
  on all compute nodes (hypervisors) and call _add_instance_info for
  each of them. This means the "_add_instance_info" function should be
  loading instances for a specific host AND hypervisor.
  "objects.InstanceList.get_by_host_and_node(context, host_name,
  compute.hypervisor_hostname)" should be used instead to limit the
  scope of loaded instances to the specific host/hypervisor tuple.

  If you run Nova in the Ironic context or anything where a single nova-
  compute host can manage LOT of hypervisors, this means you could load
  a LOT of data in memory and causing an out-of-memory error or serious
  performance degradation. For example, if you have 2000 hypervisors
  (Ironic nodes), the "_add_instance_info" function will load 2000
  instances per hypervisor (instead of 1) found in get_all_host_states,
  ending with an overall process loading 2000^2 rows from the database.

  [1]
  https://github.com/openstack/nova/blob/dd44096a04a85319481943c1b2bb2471e752b0b3/nova/scheduler/host_manager.py#L631

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1619050/+subscriptions


Follow ups