yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1679750] Re: Allocations are not cleaned up in placement for instance 'local delete' case

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: OpenStack Infra <1679750@xxxxxxxxxxxxxxxxxx>
Date: Fri, 20 Apr 2018 17:06:02 -0000
Reply-to: Bug 1679750 <1679750@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Reviewed:  https://review.openstack.org/560706
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ea9d0af31395fbe1686fa681cd91226ee580796e
Submitter: Zuul
Branch:    master

commit ea9d0af31395fbe1686fa681cd91226ee580796e
Author: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date:   Wed Apr 11 21:24:43 2018 -0400

    Delete allocations from API if nova-compute is down
    
    When performing a "local delete" of an instance, we
    need to delete the allocations that the instance has
    against any resource providers in Placement.
    
    It should be noted that without this change, restarting
    the nova-compute service will delete the allocations
    for its compute node (assuming the compute node UUID
    is the same as before the instance was deleted). That
    is shown in the existing functional test modified here.
    
    The more important reason for this change is that in
    order to fix bug 1756179, we need to make sure the
    resource provider allocations for a given compute node
    are gone by the time the compute service is deleted.
    
    This adds a new functional test and a release note for
    the new behavior and need to configure nova-api for
    talking to placement, which is idempotent if
    not configured thanks to the @safe_connect decorator
    used in SchedulerReportClient.
    
    Closes-Bug: #1679750
    Related-Bug: #1756179
    
    Change-Id: If507e23f0b7e5fa417041c3870d77786498f741d


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1679750

Title:
  Allocations are not cleaned up in placement for instance 'local
  delete' case

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) pike series:
  Confirmed
Status in OpenStack Compute (nova) queens series:
  Confirmed

Bug description:
  This is semi-related to bug 1661312 for evacuate.

  This is the case:

  1. Create an instance on host A successfully. There are allocation
  records in the placement API for the instance (consumer for the
  allocation records) and host A (resource provider).

  2. Host A goes down.

  3. Delete the instance. This triggers the local delete flow in the
  compute API where we can't RPC cast to the compute to delete the
  instance because the nova-compute service is down. So we do the delete
  in the database from the compute API (local to compute API, hence
  local delete).

  The problem is in #3 we don't remove the allocations for the instance
  from the host A resource provider during the local delete flow.

  Maybe this doesn't matter while host A is down, since the scheduler
  can't schedule to it anyway. But if host A comes back up, it will have
  allocations tied to it for deleted instances.

  On init_host in the compute service we call _complete_partial_deletion
  but that's only for instances with a vm_state of 'deleted' but aren't
  actually deleted in the database. I don't think that's going to cover
  this case because the local delete code in the compute API calls
  instance.destroy() which deletes the instance from the database
  (updates instances.deleted != 0 in the DB so it's "soft" deleted).

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1679750/+subscriptions

References

[Bug 1679750] [NEW] Allocations are not cleaned up in placement for instance 'local delete' case
From: Matt Riedemann, 2017-04-04