yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #93751
[Bug 1999674] Re: nova compute service does not reset instance with task_state in rebooting_hard
Reviewed: https://review.opendev.org/c/openstack/nova/+/867832
Committed: https://opendev.org/openstack/nova/commit/aa3e8fef7b949ec3ddb3c4eaa348eb004593d29e
Submitter: "Zuul (22348)"
Branch: master
commit aa3e8fef7b949ec3ddb3c4eaa348eb004593d29e
Author: Pierre-Samuel Le Stang <pierre-samuel.le-stang@xxxxxxxxxxxx>
Date: Thu Dec 15 18:30:15 2022 +0100
Correctly reset instance task state in rebooting hard
When a user ask for a reboot hard of a running instance while nova compute is
unavailable (service stopped or host down) it might happens under certain
conditions that the instance stays in rebooting_hard task_state after
nova-compute start again. This patch aims to fix that.
Closes-Bug: #1999674
Change-Id: I170e390fe4e467898a8dc7df6a446f62941d49ff
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1999674
Title:
nova compute service does not reset instance with task_state in
rebooting_hard
Status in OpenStack Compute (nova):
Fix Released
Bug description:
Description
===========
When a user ask for a reboot hard of a running instance while nova compute is unavailable (service stopped or host down) it might happens under certain conditions that the instance stays in rebooting_hard task_state after nova-compute start again.
The condition to get this issue is to have a rabbitmq message-ttl of
messages in queue which is lower than the time needed to get nova
compute up again.
Steps to reproduce
==================
Prerequisites:
* Set a low message-ttl (let's say 60 seconds) in your rabbitmq
* Have a running instance on a host
First case is having a failure on nova-compute service
1/ stop nova compute service on host
2/ ask for a reboot hard: openstack server reboot --hard <instance_id>
3/ wait 60 seconds
4/ start nova compute service
5/ check instance task_state and status
Second case is having a failure on the host
1/ hard shutdown the host (let's say a power supply issue)
2/ ask for a reboot hard: openstack server reboot --hard <instance_id>
3/ wait 60 seconds
2/ restart the host
5/ check instance task_state and status
Expected result
===============
We expect nova compute to be able to reset the state to active as we lost the message, to let the user take some other actions on the instance.
Actual result
=============
The instance is stuck in rebooting_hard task_state, user is blocked
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1999674/+subscriptions
References