← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1299139] [NEW] Instances stuck in deleting task_state never cleaned up

 

Public bug reported:

Bug 1248563 "Instance deletion is prevented when another component locks
up" provided a partial fix https://review.openstack.org/#/c/55444/ which
introduces another problem, which is subsequent delete requests are
ignored.

When doing Tempest 3rd party CI runs we see instances fail to build
(could be a scheduling/resource problem, timeout, whatever) and then get
stuck in deleting task_state and are never cleaned up.

The patch even says:

"Dealing with delete requests that never got executed is not in scope of
this change and will be submitted separately."

That's the bug reported here.  For example, this is several hours after
our Tempest run finished:

http://paste.openstack.org/show/74584/

There is also some history after patch 55444 merged, we had this revert
of a revert https://review.openstack.org/#/c/70187/, which got reverted
itself again later because it was causing race failures in hyper-v CI:

https://review.openstack.org/#/c/71363/

So there is a lot of half-baked code here and I haven't been able to get
a response from Stan on bug 1248563 but basically it boils down to the
original change 55444 depended on some later changes working, and those
were ultimately reverted due to race conditions breaking in the gate.

I would propose that at least for icehouse-rc1 we get the original patch
reverted since it's not a complete solution and introduces another bug.

** Affects: nova
     Importance: High
         Status: New


** Tags: api icehouse-rc-potential

** Changed in: nova
   Importance: Undecided => High

** Tags added: icehouse-rc-potential

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1299139

Title:
  Instances stuck in deleting task_state never cleaned up

Status in OpenStack Compute (Nova):
  New

Bug description:
  Bug 1248563 "Instance deletion is prevented when another component
  locks up" provided a partial fix
  https://review.openstack.org/#/c/55444/ which introduces another
  problem, which is subsequent delete requests are ignored.

  When doing Tempest 3rd party CI runs we see instances fail to build
  (could be a scheduling/resource problem, timeout, whatever) and then
  get stuck in deleting task_state and are never cleaned up.

  The patch even says:

  "Dealing with delete requests that never got executed is not in scope
  of this change and will be submitted separately."

  That's the bug reported here.  For example, this is several hours
  after our Tempest run finished:

  http://paste.openstack.org/show/74584/

  There is also some history after patch 55444 merged, we had this
  revert of a revert https://review.openstack.org/#/c/70187/, which got
  reverted itself again later because it was causing race failures in
  hyper-v CI:

  https://review.openstack.org/#/c/71363/

  So there is a lot of half-baked code here and I haven't been able to
  get a response from Stan on bug 1248563 but basically it boils down to
  the original change 55444 depended on some later changes working, and
  those were ultimately reverted due to race conditions breaking in the
  gate.

  I would propose that at least for icehouse-rc1 we get the original
  patch reverted since it's not a complete solution and introduces
  another bug.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1299139/+subscriptions


Follow ups

References