← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1593186] [NEW] Nova instance stuck in powering-off when rebooting all nodes in cluster

 

Public bug reported:

After rebooting all nodes in the cluster, all the instances that were running on the cluster are stuck in Status ACTIVE, Task state: powering-off, Power state: Crashed.
>From the log it looks that during in nova-compute service start, messages sent form init_host method vanished, because the start of RPC server is invoked only afterwards.

The menager.init_host methods, see an instance with vm_state == vm_states.ACTIVE and vm_power_state in (power_state.SHUTDOWN, power_state.CRASHED). I get the log message "Instance shutdown by itself. Calling the stop API. Current vm_state: active, current task_state: None, original DB power_state: 1, current VM power_state: 6".
Then it calls the api.stop method, which invokes the api.force_stop method, and I see the following log message "Going to try to stop instance force_stop". This method invokes through RPC a stop_instance method. But the RPC message never reach the RPC server, which is started only after the init_host is called in service.start method.
Since I am using rabbitmq, the message queues after rebooting the cluster of nodes are not initiated, and the call never gets to the destination.

After wards, the _sync_instance_power_state see the powering-off task
state, and never cleans the instance state. I get the log messages:
"During sync_power_state the instance has a pending task (powering-off).
Skip."

Nova version is 12.0.0.

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1593186

Title:
  Nova instance stuck in powering-off when rebooting all nodes in
  cluster

Status in OpenStack Compute (nova):
  New

Bug description:
  After rebooting all nodes in the cluster, all the instances that were running on the cluster are stuck in Status ACTIVE, Task state: powering-off, Power state: Crashed.
  From the log it looks that during in nova-compute service start, messages sent form init_host method vanished, because the start of RPC server is invoked only afterwards.

  The menager.init_host methods, see an instance with vm_state == vm_states.ACTIVE and vm_power_state in (power_state.SHUTDOWN, power_state.CRASHED). I get the log message "Instance shutdown by itself. Calling the stop API. Current vm_state: active, current task_state: None, original DB power_state: 1, current VM power_state: 6".
  Then it calls the api.stop method, which invokes the api.force_stop method, and I see the following log message "Going to try to stop instance force_stop". This method invokes through RPC a stop_instance method. But the RPC message never reach the RPC server, which is started only after the init_host is called in service.start method.
  Since I am using rabbitmq, the message queues after rebooting the cluster of nodes are not initiated, and the call never gets to the destination.

  After wards, the _sync_instance_power_state see the powering-off task
  state, and never cleans the instance state. I get the log messages:
  "During sync_power_state the instance has a pending task (powering-
  off). Skip."

  Nova version is 12.0.0.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1593186/+subscriptions


Follow ups