← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1784705] [NEW] ResourceTracker.stats can leak across multiple ironic nodes

 

Public bug reported:

A single nova-compute service host can manage multiple ironic nodes,
which creates multiple ComputeNode records per compute service host, and
ironic instances are 1:1 with each compute node.

Before change https://review.openstack.org/#/c/398473/ in Ocata, the
ComputeManager would manage multiple ResourceTracker instances, one per
compute node (so one per ironic instance managed by that host). As a
result of that change, the ComputeManager manages a single
ResourceTracker instance, and the ResourceTracker's compute_node entry
was changed to a dict, so the RT could manage multiple compute nodes
(one per ironic instance).

The problem is the ResourceTracker.stats variable was left to be
"shared" across all compute nodes being managed by the single RT, which
can cause problems in the ResourceTracker._update_usage_from_instance()
method which updates the stats and then assigns it to a compute node
record, so it could leak/accumulate information about the stats for a
different node.

The compute node stats are used by the ComputeCapabilitiesFilter in the
scheduler so it could be possible for a compute node B to be reporting
node capabilities which only apply to node A.

This was discovered during code review of this change:

https://review.openstack.org/#/c/576099/2/nova/compute/resource_tracker.py@1130

** Affects: nova
     Importance: High
         Status: Triaged

** Affects: nova/ocata
     Importance: Undecided
         Status: New

** Affects: nova/pike
     Importance: Undecided
         Status: New

** Affects: nova/queens
     Importance: Undecided
         Status: New


** Tags: compute ironic resource-tracker

** Also affects: nova/pike
   Importance: Undecided
       Status: New

** Also affects: nova/ocata
   Importance: Undecided
       Status: New

** Also affects: nova/queens
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1784705

Title:
  ResourceTracker.stats can leak across multiple ironic nodes

Status in OpenStack Compute (nova):
  Triaged
Status in OpenStack Compute (nova) ocata series:
  New
Status in OpenStack Compute (nova) pike series:
  New
Status in OpenStack Compute (nova) queens series:
  New

Bug description:
  A single nova-compute service host can manage multiple ironic nodes,
  which creates multiple ComputeNode records per compute service host,
  and ironic instances are 1:1 with each compute node.

  Before change https://review.openstack.org/#/c/398473/ in Ocata, the
  ComputeManager would manage multiple ResourceTracker instances, one
  per compute node (so one per ironic instance managed by that host). As
  a result of that change, the ComputeManager manages a single
  ResourceTracker instance, and the ResourceTracker's compute_node entry
  was changed to a dict, so the RT could manage multiple compute nodes
  (one per ironic instance).

  The problem is the ResourceTracker.stats variable was left to be
  "shared" across all compute nodes being managed by the single RT,
  which can cause problems in the
  ResourceTracker._update_usage_from_instance() method which updates the
  stats and then assigns it to a compute node record, so it could
  leak/accumulate information about the stats for a different node.

  The compute node stats are used by the ComputeCapabilitiesFilter in
  the scheduler so it could be possible for a compute node B to be
  reporting node capabilities which only apply to node A.

  This was discovered during code review of this change:

  https://review.openstack.org/#/c/576099/2/nova/compute/resource_tracker.py@1130

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1784705/+subscriptions


Follow ups