yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #12381
[Bug 1299139] [NEW] Instances stuck in deleting task_state never cleaned up
Public bug reported:
Bug 1248563 "Instance deletion is prevented when another component locks
up" provided a partial fix https://review.openstack.org/#/c/55444/ which
introduces another problem, which is subsequent delete requests are
ignored.
When doing Tempest 3rd party CI runs we see instances fail to build
(could be a scheduling/resource problem, timeout, whatever) and then get
stuck in deleting task_state and are never cleaned up.
The patch even says:
"Dealing with delete requests that never got executed is not in scope of
this change and will be submitted separately."
That's the bug reported here. For example, this is several hours after
our Tempest run finished:
http://paste.openstack.org/show/74584/
There is also some history after patch 55444 merged, we had this revert
of a revert https://review.openstack.org/#/c/70187/, which got reverted
itself again later because it was causing race failures in hyper-v CI:
https://review.openstack.org/#/c/71363/
So there is a lot of half-baked code here and I haven't been able to get
a response from Stan on bug 1248563 but basically it boils down to the
original change 55444 depended on some later changes working, and those
were ultimately reverted due to race conditions breaking in the gate.
I would propose that at least for icehouse-rc1 we get the original patch
reverted since it's not a complete solution and introduces another bug.
** Affects: nova
Importance: High
Status: New
** Tags: api icehouse-rc-potential
** Changed in: nova
Importance: Undecided => High
** Tags added: icehouse-rc-potential
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1299139
Title:
Instances stuck in deleting task_state never cleaned up
Status in OpenStack Compute (Nova):
New
Bug description:
Bug 1248563 "Instance deletion is prevented when another component
locks up" provided a partial fix
https://review.openstack.org/#/c/55444/ which introduces another
problem, which is subsequent delete requests are ignored.
When doing Tempest 3rd party CI runs we see instances fail to build
(could be a scheduling/resource problem, timeout, whatever) and then
get stuck in deleting task_state and are never cleaned up.
The patch even says:
"Dealing with delete requests that never got executed is not in scope
of this change and will be submitted separately."
That's the bug reported here. For example, this is several hours
after our Tempest run finished:
http://paste.openstack.org/show/74584/
There is also some history after patch 55444 merged, we had this
revert of a revert https://review.openstack.org/#/c/70187/, which got
reverted itself again later because it was causing race failures in
hyper-v CI:
https://review.openstack.org/#/c/71363/
So there is a lot of half-baked code here and I haven't been able to
get a response from Stan on bug 1248563 but basically it boils down to
the original change 55444 depended on some later changes working, and
those were ultimately reverted due to race conditions breaking in the
gate.
I would propose that at least for icehouse-rc1 we get the original
patch reverted since it's not a complete solution and introduces
another bug.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1299139/+subscriptions
Follow ups
References