← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1679750] [NEW] Allocations are not cleaned up in placement for instance 'local delete' case

 

Public bug reported:

This is semi-related to bug 1661312 for evacuate.

This is the case:

1. Create an instance on host A successfully. There are allocation
records in the placement API for the instance (consumer for the
allocation records) and host A (resource provider).

2. Host A goes down.

3. Delete the instance. This triggers the local delete flow in the
compute API where we can't RPC cast to the compute to delete the
instance because the nova-compute service is down. So we do the delete
in the database from the compute API (local to compute API, hence local
delete).

The problem is in #3 we don't remove the allocations for the instance
from the host A resource provider during the local delete flow.

Maybe this doesn't matter while host A is down, since the scheduler
can't schedule to it anyway. But if host A comes back up, it will have
allocations tied to it for deleted instances.

On init_host in the compute service we call _complete_partial_deletion
but that's only for instances with a vm_state of 'deleted' but aren't
actually deleted in the database. I don't think that's going to cover
this case because the local delete code in the compute API calls
instance.destroy() which deletes the instance from the database (updates
instances.deleted != 0 in the DB so it's "soft" deleted).

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1679750

Title:
  Allocations are not cleaned up in placement for instance 'local
  delete' case

Status in OpenStack Compute (nova):
  New

Bug description:
  This is semi-related to bug 1661312 for evacuate.

  This is the case:

  1. Create an instance on host A successfully. There are allocation
  records in the placement API for the instance (consumer for the
  allocation records) and host A (resource provider).

  2. Host A goes down.

  3. Delete the instance. This triggers the local delete flow in the
  compute API where we can't RPC cast to the compute to delete the
  instance because the nova-compute service is down. So we do the delete
  in the database from the compute API (local to compute API, hence
  local delete).

  The problem is in #3 we don't remove the allocations for the instance
  from the host A resource provider during the local delete flow.

  Maybe this doesn't matter while host A is down, since the scheduler
  can't schedule to it anyway. But if host A comes back up, it will have
  allocations tied to it for deleted instances.

  On init_host in the compute service we call _complete_partial_deletion
  but that's only for instances with a vm_state of 'deleted' but aren't
  actually deleted in the database. I don't think that's going to cover
  this case because the local delete code in the compute API calls
  instance.destroy() which deletes the instance from the database
  (updates instances.deleted != 0 in the DB so it's "soft" deleted).

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1679750/+subscriptions


Follow ups