← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1852610] Re: [SRU] API allows source compute service/node deletion while instances are pending a resize confirm/revert

 

** Summary changed:

- API allows source compute service/node deletion while instances are pending a resize confirm/revert
+ [SRU] API allows source compute service/node deletion while instances are pending a resize confirm/revert

** Also affects: nova (Ubuntu)
   Importance: Undecided
       Status: New

** Also affects: nova (Ubuntu Bionic)
   Importance: Undecided
       Status: New

** Changed in: nova (Ubuntu Bionic)
       Status: New => In Progress

** Changed in: nova (Ubuntu Bionic)
     Assignee: (unassigned) => Brett Milford (brettmilford)

** Also affects: cloud-archive
   Importance: Undecided
       Status: New

** Also affects: cloud-archive/queens
   Importance: Undecided
       Status: New

** Changed in: cloud-archive/queens
       Status: New => In Progress

** Changed in: cloud-archive/queens
     Assignee: (unassigned) => Brett Milford (brettmilford)

** Description changed:

- This is split off from bug 1829479 which is about deleting a compute
- service which had servers evacuated from it which will orphan resource
- providers in placement.
+ [Impact]
+ 
+  * API will allow deleting a source compute service which has migration-
+ based allocations for the source node resource provider and pending
+ instance resizes involving the source node.
+ 
+  * Backporting the fix will improve application resilience in this case.
+ 
+ [Test Case]
+ 
+  1. create a server on host1
+  2. resize or cold migrate it to a dest host2
+  3. delete the compute service for host1
+ 
+  At this point the resource provider for host1 is orphaned.
+ 
+  4. try to confirm/revert the resize of the server which will fail
+ because the compute node for host1 is gone and this results in the
+ server going to ERROR status
+ 
+ [Where problems could occur]
+ 
+  * This change introduces an exception condition in the API and prevents
+ the erroneous deletion of compute services which would result in
+ orphaned state.
+ 
+  * As such we should expect to see altered behavior from the API as
+ detailed in api-ref/source/os-services.inc
+ 
+  * If problems were to occur they would manifest in behavior that is
+ different from both the original behavior of the API and the new
+ behavior.
+ 
+ --- Original Description ---
+ This is split off from bug 1829479 which is about deleting a compute service which had servers evacuated from it which will orphan resource providers in placement.
  
  A similar scenario is true where the API will allow deleting a source
  compute service which has migration-based allocations for the source
  node resource provider and pending instance resizes involving the source
  node. A  simple scenario is:
  
  1. create a server on host1
  2. resize or cold migrate it to a dest host2
  3. delete the compute service for host1
  
  At this point the resource provider for host1 is orphaned.
  
  4. try to confirm/revert the resize of the server which will fail
  because the compute node for host1 is gone and this results in the
  server going to ERROR status
  
  Based on the discussion in this mailing list thread:
  
  http://lists.openstack.org/pipermail/openstack-
  discuss/2019-November/010843.html
  
  We should probably have the DELETE /os-services/{service_id} API block
  trying to delete a service that has pending migrations.

** Patch added: "lp1852610-bionic.debdiff"
   https://bugs.launchpad.net/cloud-archive/+bug/1852610/+attachment/5599045/+files/lp1852610-bionic.debdiff

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1852610

Title:
  [SRU] API allows source compute service/node deletion while instances
  are pending a resize confirm/revert

Status in Ubuntu Cloud Archive:
  New
Status in Ubuntu Cloud Archive queens series:
  In Progress
Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  Fix Committed
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Committed
Status in OpenStack Compute (nova) train series:
  Fix Committed
Status in nova package in Ubuntu:
  New
Status in nova source package in Bionic:
  In Progress

Bug description:
  [Impact]

   * API will allow deleting a source compute service which has
  migration-based allocations for the source node resource provider and
  pending instance resizes involving the source node.

   * Backporting the fix will improve application resilience in this
  case.

  [Test Case]

   1. create a server on host1
   2. resize or cold migrate it to a dest host2
   3. delete the compute service for host1

   At this point the resource provider for host1 is orphaned.

   4. try to confirm/revert the resize of the server which will fail
  because the compute node for host1 is gone and this results in the
  server going to ERROR status

  [Where problems could occur]

   * This change introduces an exception condition in the API and
  prevents the erroneous deletion of compute services which would result
  in orphaned state.

   * As such we should expect to see altered behavior from the API as
  detailed in api-ref/source/os-services.inc

   * If problems were to occur they would manifest in behavior that is
  different from both the original behavior of the API and the new
  behavior.

  --- Original Description ---
  This is split off from bug 1829479 which is about deleting a compute service which had servers evacuated from it which will orphan resource providers in placement.

  A similar scenario is true where the API will allow deleting a source
  compute service which has migration-based allocations for the source
  node resource provider and pending instance resizes involving the
  source node. A  simple scenario is:

  1. create a server on host1
  2. resize or cold migrate it to a dest host2
  3. delete the compute service for host1

  At this point the resource provider for host1 is orphaned.

  4. try to confirm/revert the resize of the server which will fail
  because the compute node for host1 is gone and this results in the
  server going to ERROR status

  Based on the discussion in this mailing list thread:

  http://lists.openstack.org/pipermail/openstack-
  discuss/2019-November/010843.html

  We should probably have the DELETE /os-services/{service_id} API block
  trying to delete a service that has pending migrations.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1852610/+subscriptions



References