yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #90751
[Bug 1999674] [NEW] nova compute service does not reset instance with task_state in rebooting_hard
Public bug reported:
Description
===========
When a user ask for a reboot hard of a running instance while nova compute is unavailable (service stopped or host down) it might happens under certain conditions that the instance stays in rebooting_hard task_state after nova-compute start again.
The condition to get this issue is to have a rabbitmq message-ttl of
messages in queue which is lower than the time needed to get nova
compute up again.
Steps to reproduce
==================
Prerequisites:
* Set a low message-ttl (let's say 60 seconds) in your rabbitmq
* Have a running instance on a host
First case is having a failure on nova-compute service
1/ stop nova compute service on host
2/ ask for a reboot hard: openstack server reboot --hard <instance_id>
3/ wait 60 seconds
4/ start nova compute service
5/ check instance task_state and status
Second case is having a failure on the host
1/ hard shutdown the host (let's say a power supply issue)
2/ ask for a reboot hard: openstack server reboot --hard <instance_id>
3/ wait 60 seconds
2/ restart the host
5/ check instance task_state and status
Expected result
===============
We expect nova compute to be able to reset the state to active as we lost the message, to let the user take some other actions on the instance.
Actual result
=============
The instance is stuck in rebooting_hard task_state, user is blocked
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1999674
Title:
nova compute service does not reset instance with task_state in
rebooting_hard
Status in OpenStack Compute (nova):
New
Bug description:
Description
===========
When a user ask for a reboot hard of a running instance while nova compute is unavailable (service stopped or host down) it might happens under certain conditions that the instance stays in rebooting_hard task_state after nova-compute start again.
The condition to get this issue is to have a rabbitmq message-ttl of
messages in queue which is lower than the time needed to get nova
compute up again.
Steps to reproduce
==================
Prerequisites:
* Set a low message-ttl (let's say 60 seconds) in your rabbitmq
* Have a running instance on a host
First case is having a failure on nova-compute service
1/ stop nova compute service on host
2/ ask for a reboot hard: openstack server reboot --hard <instance_id>
3/ wait 60 seconds
4/ start nova compute service
5/ check instance task_state and status
Second case is having a failure on the host
1/ hard shutdown the host (let's say a power supply issue)
2/ ask for a reboot hard: openstack server reboot --hard <instance_id>
3/ wait 60 seconds
2/ restart the host
5/ check instance task_state and status
Expected result
===============
We expect nova compute to be able to reset the state to active as we lost the message, to let the user take some other actions on the instance.
Actual result
=============
The instance is stuck in rebooting_hard task_state, user is blocked
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1999674/+subscriptions
Follow ups