← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1811726] Re: Deleting compute service only deletes "first" ironic node resource provider from placement

 

Reviewed:  https://review.opendev.org/657016
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=650fe118d128f09f78552b82abc114bb4b84930e
Submitter: Zuul
Branch:    master

commit 650fe118d128f09f78552b82abc114bb4b84930e
Author: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date:   Fri May 3 15:23:57 2019 -0400

    Delete resource providers for all nodes when deleting compute service
    
    Change I7b8622b178d5043ed1556d7bdceaf60f47e5ac80 started deleting the
    compute node resource provider associated with a compute node when
    deleting a nova-compute service. However, it would only delete the
    first compute node associated with the service which means for an
    ironic compute service that is managing multiple nodes, the resource
    providers were not cleaned up in placement. This fixes the issue by
    iterating all the compute nodes and cleaning up their providers.
    Note this could be potentially a lot of nodes, but we don't really
    have many good options here but to iterate them and clean them up
    one at a time.
    
    Note that this is best-effort but because of how the
    SchedulerReportClient.delete_resource_provider method ignores
    ResourceProviderInUse errors, and we could have stale allocations
    on the host for which delete_resource_provider is not accounting,
    namely allocations from evacuated instances (or incomplete migrations
    though you can't migrate baremetal instances today), we could still
    delete the compute service and orphan those in-use providers. That,
    however, is no worse than before this change where we did not try
    to cleanup all providers. The issue described above is being tracked
    with bug 1829479 and will be dealt with separately.
    
    Change-Id: I9e852e25ea89f32bf19cdaeb1f5dac8f749f5dbc
    Closes-Bug: #1811726


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1811726

Title:
  Deleting compute service only deletes "first" ironic node resource
  provider from placement

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) ocata series:
  Confirmed
Status in OpenStack Compute (nova) pike series:
  Confirmed
Status in OpenStack Compute (nova) queens series:
  Confirmed
Status in OpenStack Compute (nova) rocky series:
  Confirmed
Status in OpenStack Compute (nova) stein series:
  Confirmed

Bug description:
  NB: This comes from code inspection, not observed behavior.

  When the compute service is deleted, we attempt to delete from
  placement the resource provider associated with the compute node
  associated with the service [1].

  But ironic deployments can have multiple compute nodes. In this case,
  the compute node associated with the service is the "first" such node
  [2].

  What happens then is the compute node records are deleted, leaving the
  remaining N-1 nodes' provider records orphaned. Those get cleaned up
  (I think?) by update_available_resource when the service is recreated
  [3].

  So we're deleting and recreating the ironic node resource providers,
  but in a weird order. We should probably either fix the code at [1] to
  delete all of them, or none of them (and let the orphan handling code
  do all the work).

  [1] https://github.com/openstack/nova/blob/da98f4ba4554139b3901103aa0d26876b11e1d9a/nova/api/openstack/compute/services.py#L244-L247
  [2] https://github.com/openstack/nova/blob/da98f4ba4554139b3901103aa0d26876b11e1d9a/nova/objects/service.py#L308-L311
  [3] https://github.com/openstack/nova/blob/da98f4ba4554139b3901103aa0d26876b11e1d9a/nova/compute/manager.py#L7757-L7771

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1811726/+subscriptions


References