yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1593186] [NEW] Nova instance stuck in powering-off when rebooting all nodes in cluster

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Eyal Posener <eyal@xxxxxxxxxxxxxxx>
Date: Thu, 16 Jun 2016 10:59:11 -0000
Reply-to: Bug 1593186 <1593186@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Public bug reported:

After rebooting all nodes in the cluster, all the instances that were running on the cluster are stuck in Status ACTIVE, Task state: powering-off, Power state: Crashed.
>From the log it looks that during in nova-compute service start, messages sent form init_host method vanished, because the start of RPC server is invoked only afterwards.

The menager.init_host methods, see an instance with vm_state == vm_states.ACTIVE and vm_power_state in (power_state.SHUTDOWN, power_state.CRASHED). I get the log message "Instance shutdown by itself. Calling the stop API. Current vm_state: active, current task_state: None, original DB power_state: 1, current VM power_state: 6".
Then it calls the api.stop method, which invokes the api.force_stop method, and I see the following log message "Going to try to stop instance force_stop". This method invokes through RPC a stop_instance method. But the RPC message never reach the RPC server, which is started only after the init_host is called in service.start method.
Since I am using rabbitmq, the message queues after rebooting the cluster of nodes are not initiated, and the call never gets to the destination.

After wards, the _sync_instance_power_state see the powering-off task
state, and never cleans the instance state. I get the log messages:
"During sync_power_state the instance has a pending task (powering-off).
Skip."

Nova version is 12.0.0.

** Affects: nova
Importance: Undecided
Status: New

--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1593186

Title:
Nova instance stuck in powering-off when rebooting all nodes in
cluster

Status in OpenStack Compute (nova):
New

Bug description:
After rebooting all nodes in the cluster, all the instances that were running on the cluster are stuck in Status ACTIVE, Task state: powering-off, Power state: Crashed.
From the log it looks that during in nova-compute service start, messages sent form init_host method vanished, because the start of RPC server is invoked only afterwards.

Nova version is 12.0.0.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1593186/+subscriptions

Follow ups

[Bug 1593186] Re: Nova instance stuck in powering-off when rebooting all nodes in cluster
From: Launchpad Bug Tracker, 2017-10-22