← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1590607] [NEW] incorrect handling of host numa cell usage with instances having no numa topology

 

Public bug reported:

I think there is a problem in host NUMA node resource tracking when
there is an instance with no numa topology on the same node as instances
with numa topology.

It's triggered while running the resource audit, which ultimately calls
hardware.get_host_numa_usage_from_instance() and assigns the result to
self.compute_node.numa_topology.

The problem occurs if you have a number of instances with numa topology,
and then an instance with no numa topology. When running
numa_usage_from_instances() for the instance with no numa topology we
cache the values of "memory_usage" and "cpu_usage". However, because
instance.cells is empty we don't enter the loop. Since the two lines in
this commit are indented too far they don't get called, and we end up
appending a host cell with "cpu_usage" and "memory_usage" of zero.
This results in a host numa_topology cell with incorrect "cpu_usage" and
"memory_usage" values, though I think the overall host cpu/memory usage
is still correct.

The fix is to reduce the indentation of the two lines in question so
that they get called even when the instance has no numa topology. This
writes the original host cell usage information back to it.

** Affects: nova
     Importance: Undecided
     Assignee: Chris Friesen (cbf123)
         Status: New


** Tags: compute scheduler

** Changed in: nova
     Assignee: (unassigned) => Chris Friesen (cbf123)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1590607

Title:
  incorrect handling of host numa cell usage with instances having no
  numa topology

Status in OpenStack Compute (nova):
  New

Bug description:
  I think there is a problem in host NUMA node resource tracking when
  there is an instance with no numa topology on the same node as
  instances with numa topology.

  It's triggered while running the resource audit, which ultimately
  calls hardware.get_host_numa_usage_from_instance() and assigns the
  result to self.compute_node.numa_topology.

  The problem occurs if you have a number of instances with numa
  topology, and then an instance with no numa topology. When running
  numa_usage_from_instances() for the instance with no numa topology we
  cache the values of "memory_usage" and "cpu_usage". However, because
  instance.cells is empty we don't enter the loop. Since the two lines
  in this commit are indented too far they don't get called, and we end
  up appending a host cell with "cpu_usage" and "memory_usage" of zero.
  This results in a host numa_topology cell with incorrect "cpu_usage"
  and "memory_usage" values, though I think the overall host cpu/memory
  usage is still correct.

  The fix is to reduce the indentation of the two lines in question so
  that they get called even when the instance has no numa topology. This
  writes the original host cell usage information back to it.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1590607/+subscriptions


Follow ups