yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1999674] Re: nova compute service does not reset instance with task_state in rebooting_hard

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: OpenStack Infra <1999674@xxxxxxxxxxxxxxxxxx>
Date: Wed, 20 Mar 2024 13:34:29 -0000
Reply-to: Bug 1999674 <1999674@xxxxxxxxxxxxxxxxxx>
Sender: noreply@xxxxxxxxxxxxx

Reviewed:  https://review.opendev.org/c/openstack/nova/+/867832
Committed: https://opendev.org/openstack/nova/commit/aa3e8fef7b949ec3ddb3c4eaa348eb004593d29e
Submitter: "Zuul (22348)"
Branch:    master

commit aa3e8fef7b949ec3ddb3c4eaa348eb004593d29e
Author: Pierre-Samuel Le Stang <pierre-samuel.le-stang@xxxxxxxxxxxx>
Date:   Thu Dec 15 18:30:15 2022 +0100

    Correctly reset instance task state in rebooting hard
    
    When a user ask for a reboot hard of a running instance while nova compute is
    unavailable (service stopped or host down) it might happens under certain
    conditions that the instance stays in rebooting_hard task_state after
    nova-compute start again. This patch aims to fix that.
    
    Closes-Bug: #1999674
    Change-Id: I170e390fe4e467898a8dc7df6a446f62941d49ff


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1999674

Title:
  nova compute service does not reset instance with task_state in
  rebooting_hard

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  Description
  ===========
  When a user ask for a reboot hard of a running instance while nova compute is unavailable (service stopped or host down) it might happens under certain conditions that the instance stays in rebooting_hard task_state after nova-compute start again.

  The condition to get this issue is to have a rabbitmq message-ttl of
  messages in queue which is lower than the time needed to get nova
  compute up again.

  
  Steps to reproduce
  ==================

  Prerequisites:
  * Set a low message-ttl (let's say 60 seconds) in your rabbitmq 
  * Have a running instance on a host

  First case is having a failure on nova-compute service
  1/ stop nova compute service on host
  2/ ask for a reboot hard: openstack server reboot --hard <instance_id>
  3/ wait 60 seconds
  4/ start nova compute service
  5/ check instance task_state and status 

  Second case is having a failure on the host
  1/ hard shutdown the host (let's say a power supply issue)
  2/ ask for a reboot hard: openstack server reboot --hard <instance_id>
  3/ wait 60 seconds
  2/ restart the host
  5/ check instance task_state and status 

  
  Expected result
  ===============
  We expect nova compute to be able to reset the state to active as we lost the message, to let the user take some other actions on the instance.

  Actual result
  =============
  The instance is stuck in rebooting_hard task_state, user is blocked

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1999674/+subscriptions

References

[Bug 1999674] [NEW] nova compute service does not reset instance with task_state in rebooting_hard
From: Pierre-Samuel LE STANG, 2022-12-14