← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1829479] Re: The allocation table has residual records when instance is evacuated and the source physical node is removed

 

Reviewed:  https://review.opendev.org/c/openstack/nova/+/778696
Committed: https://opendev.org/openstack/nova/commit/e5a34fffdf97fcda7d0abfdc9e23485479ca2c4f
Submitter: "Zuul (22348)"
Branch:    master

commit e5a34fffdf97fcda7d0abfdc9e23485479ca2c4f
Author: Takashi Kajinami <tkajinam@xxxxxxxxxx>
Date:   Thu Mar 4 22:27:25 2021 +0900

    Clean up allocations left by evacuation when deleting service
    
    When a compute node goes down and all instances on the compute node
    are evacuated, allocation records about these instance are still left
    in the source compute node until nova-compute service is again started
    on the node. However if a compute node is completely broken, it is not
    possible to start the service again.
    In this situation deleting nova-compute service for the compute node
    doesn't delete its resource provider record, and even if a user tries
    to delete the resource provider, the delete request is rejected because
    allocations are still left on that node.
    
    This change ensures that remaining allocations left by successful
    evacuations are cleared when deleting a nova-compute service, to avoid
    any resource provider record left even if a compute node can't be
    recovered. Migration records are still left in 'done' status to trigger
    clean-up tasks in case the compute node is recovered later.
    
    Closes-Bug: #1829479
    Change-Id: I3ce6f6275bfe09d43718c3a491b3991a804027bd


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1829479

Title:
  The allocation table has residual records when instance is evacuated
  and the source physical node is removed

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  Description

  ===========

  When the compute node service is down due to a failure, we choose to
  evacuate instances located on it. After successful evacuation, the
  relevant records in allocation table will not be cleared, it will only
  be cleared when the compute service of the source node is restored.


  Unfortunately, if the failure node is down because of some
  unrecoverable failures, and compute service on it will never be
  restored, there will be residual records in the allocation table.


  Further more, if we try to delete the down compute service, record
  associated with this service will not be deleted in reource_provider
  table, because of the residual record in allocation table.


  Perhaps after a successful evacuation, we need to add operations to
  clear the allocation table, not just after the source node service is
  restored.


  Steps to reproduce

  ==================

  1.down a compute service

  2.evacuate the instances on it

  3.delete compute service with command: nova service-delete uuid


  Expected result

  ===============

  compute service is deleted successful, and resource_provider has no
  relevant record


  Actual result

  =============

  compute service is deleted successful, but resource_provider still has
  relevant record

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1829479/+subscriptions



References