yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1816086] [NEW] Resource Tracker performance with Ironic driver

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Belmiro Moreira <1816086@xxxxxxxxxxxxxxxxxx>
Date: Fri, 15 Feb 2019 15:37:53 -0000
Reply-to: Bug 1816086 <1816086@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Public bug reported:

The problem is in rocky.

The resource tracker builds the resource provider tree and it's updated 2 times in "_update_available_resource".
With "_init_compute_node" and in the "_update_available_resource" itself.

The problem is that the RP tree will contain all the ironic RP and all
the tree is flushed to placement (2 times as described above) when the
periodic task iterate per Ironic RP.

In our case with 1700 ironic nodes, the period task takes:
1700 x (2 x 7s) = ~6h

+++

mitigations:
- shard nova-compute. Have several nova-computes dedicated to ironic.
Most of the current deployments only use 1 nova-compute to avoid resources shuffle/recreation between nova-computes.
Several nova-computes will be need to accommodate the load.

- why do we need to do the full resource provider tree flush to placement and not only the RP that is being considered?
As a work around we are doing this now!

** Affects: nova
Importance: Undecided
Status: New

--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1816086

Title:
Resource Tracker performance with Ironic driver

Status in OpenStack Compute (nova):
New

Bug description:
The problem is in rocky.

The resource tracker builds the resource provider tree and it's updated 2 times in "_update_available_resource".
With "_init_compute_node" and in the "_update_available_resource" itself.

The problem is that the RP tree will contain all the ironic RP and all
the tree is flushed to placement (2 times as described above) when the
periodic task iterate per Ironic RP.

In our case with 1700 ironic nodes, the period task takes:
1700 x (2 x 7s) = ~6h

+++

- why do we need to do the full resource provider tree flush to placement and not only the RP that is being considered?
As a work around we are doing this now!

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1816086/+subscriptions

Follow ups

[Bug 1816086] Re: Resource Tracker performance with Ironic driver
From: Matt Riedemann, 2019-07-10
[Bug 1816086] Re: Resource Tracker performance with Ironic driver
From: OpenStack Infra, 2019-07-10