yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1784705] Re: ResourceTracker.stats can leak across multiple ironic nodes

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: OpenStack Infra <1784705@xxxxxxxxxxxxxxxxxx>
Date: Thu, 02 Aug 2018 02:31:01 -0000
Reply-to: Bug 1784705 <1784705@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Reviewed:  https://review.openstack.org/587636
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b5b7d86bb04f92d21cf954cd6b3463c9fcc637e6
Submitter: Zuul
Branch:    master

commit b5b7d86bb04f92d21cf954cd6b3463c9fcc637e6
Author: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date:   Tue Jul 31 17:26:47 2018 -0400

    Make ResourceTracker.stats node-specific
    
    As of change I6827137f35c0cb4f9fc4c6f753d9a035326ed01b in
    Ocata, the ResourceTracker manages multiple compute nodes
    via its "compute_nodes" variable, but the "stats" variable
    was still being shared across all nodes, which leads to
    leaking stats across nodes in an ironic deployment where
    a single nova-compute service host is managing multiple
    ironic instances (nodes).
    
    This change makes ResourceTracker.stats node-specific
    which fixes the ironic leak but also allows us to remove
    the stats deepcopy while iterating over instances which
    should improve performance for single-node deployments with
    potentially a large number of instances, i.e. vCenter.
    
    Change-Id: I0b9e5b711878fa47ba90e43c0b41437b57cf8ef6
    Closes-Bug: #1784705
    Closes-Bug: #1777422


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1784705

Title:
  ResourceTracker.stats can leak across multiple ironic nodes

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) ocata series:
  In Progress
Status in OpenStack Compute (nova) pike series:
  In Progress
Status in OpenStack Compute (nova) queens series:
  In Progress

Bug description:
  A single nova-compute service host can manage multiple ironic nodes,
  which creates multiple ComputeNode records per compute service host,
  and ironic instances are 1:1 with each compute node.

  Before change https://review.openstack.org/#/c/398473/ in Ocata, the
  ComputeManager would manage multiple ResourceTracker instances, one
  per compute node (so one per ironic instance managed by that host). As
  a result of that change, the ComputeManager manages a single
  ResourceTracker instance, and the ResourceTracker's compute_node entry
  was changed to a dict, so the RT could manage multiple compute nodes
  (one per ironic instance).

  The problem is the ResourceTracker.stats variable was left to be
  "shared" across all compute nodes being managed by the single RT,
  which can cause problems in the
  ResourceTracker._update_usage_from_instance() method which updates the
  stats and then assigns it to a compute node record, so it could
  leak/accumulate information about the stats for a different node.

  The compute node stats are used by the ComputeCapabilitiesFilter in
  the scheduler so it could be possible for a compute node B to be
  reporting node capabilities which only apply to node A.

  This was discovered during code review of this change:

  https://review.openstack.org/#/c/576099/2/nova/compute/resource_tracker.py@1130

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1784705/+subscriptions

References

[Bug 1784705] [NEW] ResourceTracker.stats can leak across multiple ironic nodes
From: Matt Riedemann, 2018-07-31