← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1571175] [NEW] RPC failure makes server stuck in a task

 

Public bug reported:

When RPC between services fails, because of a bug in the RPC service,
network faults etc., the server gets stuck in a task, and never exits
it.

For example, when powering up a server, nova-api sets the task_state of
the server to 'powering-on', and then sends nova-compute an RPC message
to nova-compute. nova-compute is the one responsible for setting the
task_state back to NULL after the server was powered-up, but if the RPC
message fails to reach nova-compute, the server is forever stuck in
'powering-on'. This can prevent further API operations on the VM and
leave a good VMs in non-operable state. The user can usually bypass this
manually by running a 'reset-state'.

To reproduce:
1. Make RPC messages hang
2. Issue a 'server start' request
3. After request, do a 'server show' - the server is stuck in 'powering-on' forever.

This issue was previously reported to the mailing list. Please see this:
http://lists.openstack.org/pipermail/openstack-dev/2016-April/092239.html
and this
http://lists.openstack.org/pipermail/openstack-dev/2016-April/092240.html

** Affects: nova
     Importance: Undecided
         Status: New


** Tags: compute

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1571175

Title:
  RPC failure makes server stuck in a task

Status in OpenStack Compute (nova):
  New

Bug description:
  When RPC between services fails, because of a bug in the RPC service,
  network faults etc., the server gets stuck in a task, and never exits
  it.

  For example, when powering up a server, nova-api sets the task_state
  of the server to 'powering-on', and then sends nova-compute an RPC
  message to nova-compute. nova-compute is the one responsible for
  setting the task_state back to NULL after the server was powered-up,
  but if the RPC message fails to reach nova-compute, the server is
  forever stuck in 'powering-on'. This can prevent further API
  operations on the VM and leave a good VMs in non-operable state. The
  user can usually bypass this manually by running a 'reset-state'.

  To reproduce:
  1. Make RPC messages hang
  2. Issue a 'server start' request
  3. After request, do a 'server show' - the server is stuck in 'powering-on' forever.

  This issue was previously reported to the mailing list. Please see this:
  http://lists.openstack.org/pipermail/openstack-dev/2016-April/092239.html
  and this
  http://lists.openstack.org/pipermail/openstack-dev/2016-April/092240.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1571175/+subscriptions