← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2043036] [NEW] [ironic] list_instances/list_instance_uuid does not respect conductor_group/partition_key

 

Public bug reported:

The methods on the Ironic driver, list_instances and list_instance_uuids
are not currently respecting the conductor_group option:
https://opendev.org/openstack/nova/src/branch/master/nova/conf/ironic.py#L71.

This leads to significant performance degradation, as querying Ironic
for all nodes (/v1/nodes) instead of all nodes managed by the compute
(/v1/nodes?conductor_group=blah) is a significantly more expensive API
call.

In addition, this can lead to unexpected behavior for operators, such as
an action being taken by a compute serving conductor group "A" to
resolve an issue that would normally be resolved by a compute service
conductor group "B".


While troubleshooting this error, we dug deeply into what this data is used for; it's used for two things:
- Reconciling deleted instances as a periodic job
- Ensuring no instances exist on a newly-started compute host


These are tasks which either could use stale data or would not be impacted by using the Ironic driver's existing node cache. Therefore, a suggested fix is:

Revise list_instances and list_instance_uuids to reuse the node cache to
reduce the overall API calls being made to Ironic, and ensure all
/v1/nodes calls use the same codepath in the Ironic driver. It's the
belief of JayF, TheJulia, and Johnthetubaguy (on a video call right now)
that using stale data, without refreshing the cache, should be safe for
these use cases. (Even if we decide to refresh the cache, we should use
this code path anyway.)

** Affects: ironic
     Importance: Medium
         Status: Confirmed

** Affects: nova
     Importance: Undecided
         Status: New


** Tags: ironic

** Also affects: ironic
   Importance: Undecided
       Status: New

** Changed in: ironic
       Status: New => Confirmed

** Changed in: ironic
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2043036

Title:
  [ironic] list_instances/list_instance_uuid does not respect
  conductor_group/partition_key

Status in Ironic:
  Confirmed
Status in OpenStack Compute (nova):
  New

Bug description:
  The methods on the Ironic driver, list_instances and
  list_instance_uuids are not currently respecting the conductor_group
  option:
  https://opendev.org/openstack/nova/src/branch/master/nova/conf/ironic.py#L71.

  This leads to significant performance degradation, as querying Ironic
  for all nodes (/v1/nodes) instead of all nodes managed by the compute
  (/v1/nodes?conductor_group=blah) is a significantly more expensive API
  call.

  In addition, this can lead to unexpected behavior for operators, such
  as an action being taken by a compute serving conductor group "A" to
  resolve an issue that would normally be resolved by a compute service
  conductor group "B".

  
  While troubleshooting this error, we dug deeply into what this data is used for; it's used for two things:
  - Reconciling deleted instances as a periodic job
  - Ensuring no instances exist on a newly-started compute host

  
  These are tasks which either could use stale data or would not be impacted by using the Ironic driver's existing node cache. Therefore, a suggested fix is:

  Revise list_instances and list_instance_uuids to reuse the node cache
  to reduce the overall API calls being made to Ironic, and ensure all
  /v1/nodes calls use the same codepath in the Ironic driver. It's the
  belief of JayF, TheJulia, and Johnthetubaguy (on a video call right
  now) that using stale data, without refreshing the cache, should be
  safe for these use cases. (Even if we decide to refresh the cache, we
  should use this code path anyway.)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ironic/+bug/2043036/+subscriptions



Follow ups