yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1816086] Re: Resource Tracker performance with Ironic driver

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: OpenStack Infra <1816086@xxxxxxxxxxxxxxxxxx>
Date: Wed, 10 Jul 2019 15:21:18 -0000
Reply-to: Bug 1816086 <1816086@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Reviewed:  https://review.opendev.org/637225
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8c797450cbff5194fb6791cd0a07fa060dc8af72
Submitter: Zuul
Branch:    master

commit 8c797450cbff5194fb6791cd0a07fa060dc8af72
Author: Eric Fried <openstack@xxxxxxxx>
Date:   Fri Feb 15 10:54:36 2019 -0600

    Perf: Use dicts for ProviderTree roots
    
    ProviderTree used to keep track of root providers in a list. Since we
    don't yet have sharing providers, this would always be a list of one for
    non-ironic deployments, or N for ironic deployments of N nodes.
    
    To find a provider (by name or UUID), we would iterate over this list,
    an O(N) operation. For large ironic deployments, this added up fast -
    see the referenced bug.
    
    With this change, we store roots in two dicts: one keyed by UUID, one
    keyed by name. To find a provider, we first check these dicts. If the
    provider we're looking for is a root, this is now O(1). (If it's a
    child, it would still be O(N), because we iterate over all the roots
    looking for a descendant that matches. But ironic deployments don't have
    child providers (yet?) (right?) so that should be n/a. For non-ironic
    deployments it's unchanged: O(M) where M is the number of descendants,
    which should be very small for the time being.)
    
    Test note: Existing tests in nova.tests.unit.compute.test_provider_tree
    thoroughly cover all the affected code paths. There was one usage of
    ProviderTree.roots that was untested and broken (even before this
    change) which is now fixed.
    
    Change-Id: Ibf430a8bc2a2af9353b8cdf875f8506377a1c9c2
    Closes-Bug: #1816086


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1816086

Title:
  Resource Tracker performance with Ironic driver

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  The problem is in rocky.

  The resource tracker builds the resource provider tree and it's updated 2 times in "_update_available_resource". 
  With "_init_compute_node" and in the "_update_available_resource" itself.

  The problem is that the RP tree will contain all the ironic RP and all
  the tree is flushed to placement (2 times as described above) when the
  periodic task iterate per Ironic RP.

  In our case with 1700 ironic nodes, the period task takes:
  1700 x (2 x 7s) = ~6h

  +++

  mitigations:
  - shard nova-compute. Have several nova-computes dedicated to ironic.
  Most of the current deployments only use 1 nova-compute to avoid resources shuffle/recreation between nova-computes.
  Several nova-computes will be need to accommodate the load.

  - why do we need to do the full resource provider tree flush to placement and not only the RP that is being considered?
  As a work around we are doing this now!

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1816086/+subscriptions

References

[Bug 1816086] [NEW] Resource Tracker performance with Ironic driver
From: Belmiro Moreira, 2019-02-15