yahoo-eng-team team mailing list archive
  
  - 
     yahoo-eng-team team yahoo-eng-team team
- 
    Mailing list archive
  
- 
    Message #20710
  
 [Bug 1367186] [NEW] Instances stuck with task_state of unshelving after RPC call timeout.
  
Public bug reported:
Instances stuck with task_state of unshelving after RPC call between
nova-conductor and nova-scheduler fails(because of, for example,
timeout) in the operation of unshelve.
The environment:
Ubuntu 14.04 LTS(64bit)
stable/icehouse(2014.1.2)
(I could also reproduce it with master(commit:a1fa42f2ad11258f8b9482353e078adcf73ee9c2).)
How to reproduce:
1. create a VM instance
2. shelve the VM instance
3. stop nova-scheduler process
4. unshelve the VM instance
(The nova-conductor calls the nova-scheduler, but the RPC call times out.)
Then the VM instance stucks with task_state of unshelving(See the following).
The VM instance still remains stuck even after nova-scheduler process starts again.
stack@devstack-icehouse:/opt/devstack$ nova list
+--------------------------------------+---------+-------------------+------------+-------------+-------------------+
| ID                                   | Name    | Status            | Task State | Power State | Networks          |
+--------------------------------------+---------+-------------------+------------+-------------+-------------------+
| 12e488e8-1df1-479d-866e-51c3117e384b | server1 | SHELVED_OFFLOADED | unshelving | Shutdown    | public=10.0.2.194 |
+--------------------------------------+---------+-------------------+------------+-------------+-------------------+
nova-conductor.log:
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2014-09-09 18:18:13.263 13087 ERROR oslo.messaging.rpc.dispatcher [-] Exception during message handling: Timed out waiting for a reply to message ID 934be80a9798443597f355d60fa08e56
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher Traceback (most recent call last):
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 134, in _dispatch_and_reply
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher     incoming.message))
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 177, in _dispatch
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher     return self._do_dispatch(endpoint, method, ctxt, args)
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 123, in _do_dispatch
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher     result = getattr(endpoint, method)(ctxt, **new_args)
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher   File "/opt/stack/nova/nova/conductor/manager.py", line 849, in unshelve_instance
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher     instance)
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher   File "/opt/stack/nova/nova/conductor/manager.py", line 816, in _schedule_instances
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher     request_spec, filter_properties)
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher   File "/opt/stack/nova/nova/scheduler/rpcapi.py", line 103, in select_destinations
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher     request_spec=request_spec, filter_properties=filter_properties)
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/client.py", line 152, in call
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher     retry=self.retry)
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/transport.py", line 90, in _send
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher     timeout=timeout, retry=retry)
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 404, in send
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher     retry=retry)
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 393, in _send
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher     result = self._waiter.wait(msg_id, timeout)
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 281, in wait
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher     reply, ending = self._poll_connection(msg_id, timeout)
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 231, in _poll_connection
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher     % msg_id)
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher MessagingTimeout: Timed out waiting for a reply to message ID 934be80a9798443597f355d60fa08e56
2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher 
2014-09-09 18:18:13.274 13087 ERROR oslo.messaging._drivers.common [-] Returning exception Timed out waiting for a reply to message ID 934be80a9798443597f355d60fa08e56 to caller
2014-09-09 18:18:13.275 13087 ERROR oslo.messaging._drivers.common [-] ['Traceback (most recent call last):\n', '  File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 134, in _dispatch_and_reply\n    incoming.message))\n', '  File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 177, in _dispatch\n    return self._do_dispatch(endpoint, method, ctxt, args)\n', '  File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 123, in _do_dispatch\n    result = getattr(endpoint, method)(ctxt, **new_args)\n', '  File "/opt/stack/nova/nova/conductor/manager.py", line 849, in unshelve_instance\n    instance)\n', '  File "/opt/stack/nova/nova/conductor/manager.py", line 816, in _schedule_instances\n    request_spec, filter_properties)\n', '  File "/opt/stack/nova/nova/scheduler/rpcapi.py", line 103, in select_destinations\n    request_spec=request_spec, filter_properties=filter_properties)\n', '  File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/client.py", line 152, in call\n    retry=self.retry)\n', '  File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/transport.py", line 90, in _send\n    timeout=timeout, retry=retry)\n', '  File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 404, in send\n    retry=retry)\n', '  File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 393, in _send\n    result = self._waiter.wait(msg_id, timeout)\n', '  File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 281, in wait\n    reply, ending = self._poll_connection(msg_id, timeout)\n', '  File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 231, in _poll_connection\n    % msg_id)\n', 'MessagingTimeout: Timed out waiting for a reply to message ID 934be80a9798443597f355d60fa08e56\n']
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
** Affects: nova
     Importance: Undecided
     Assignee: Takashi NATSUME (natsume-takashi)
         Status: New
** Changed in: nova
     Assignee: (unassigned) => Takashi NATSUME (natsume-takashi)
-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1367186
Title:
  Instances stuck with task_state of unshelving after RPC call timeout.
Status in OpenStack Compute (Nova):
  New
Bug description:
  Instances stuck with task_state of unshelving after RPC call between
  nova-conductor and nova-scheduler fails(because of, for example,
  timeout) in the operation of unshelve.
  The environment:
  Ubuntu 14.04 LTS(64bit)
  stable/icehouse(2014.1.2)
  (I could also reproduce it with master(commit:a1fa42f2ad11258f8b9482353e078adcf73ee9c2).)
  How to reproduce:
  1. create a VM instance
  2. shelve the VM instance
  3. stop nova-scheduler process
  4. unshelve the VM instance
  (The nova-conductor calls the nova-scheduler, but the RPC call times out.)
  Then the VM instance stucks with task_state of unshelving(See the following).
  The VM instance still remains stuck even after nova-scheduler process starts again.
  stack@devstack-icehouse:/opt/devstack$ nova list
  +--------------------------------------+---------+-------------------+------------+-------------+-------------------+
  | ID                                   | Name    | Status            | Task State | Power State | Networks          |
  +--------------------------------------+---------+-------------------+------------+-------------+-------------------+
  | 12e488e8-1df1-479d-866e-51c3117e384b | server1 | SHELVED_OFFLOADED | unshelving | Shutdown    | public=10.0.2.194 |
  +--------------------------------------+---------+-------------------+------------+-------------+-------------------+
  nova-conductor.log:
  ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  2014-09-09 18:18:13.263 13087 ERROR oslo.messaging.rpc.dispatcher [-] Exception during message handling: Timed out waiting for a reply to message ID 934be80a9798443597f355d60fa08e56
  2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher Traceback (most recent call last):
  2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 134, in _dispatch_and_reply
  2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher     incoming.message))
  2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 177, in _dispatch
  2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher     return self._do_dispatch(endpoint, method, ctxt, args)
  2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 123, in _do_dispatch
  2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher     result = getattr(endpoint, method)(ctxt, **new_args)
  2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher   File "/opt/stack/nova/nova/conductor/manager.py", line 849, in unshelve_instance
  2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher     instance)
  2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher   File "/opt/stack/nova/nova/conductor/manager.py", line 816, in _schedule_instances
  2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher     request_spec, filter_properties)
  2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher   File "/opt/stack/nova/nova/scheduler/rpcapi.py", line 103, in select_destinations
  2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher     request_spec=request_spec, filter_properties=filter_properties)
  2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/client.py", line 152, in call
  2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher     retry=self.retry)
  2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/transport.py", line 90, in _send
  2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher     timeout=timeout, retry=retry)
  2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 404, in send
  2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher     retry=retry)
  2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 393, in _send
  2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher     result = self._waiter.wait(msg_id, timeout)
  2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 281, in wait
  2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher     reply, ending = self._poll_connection(msg_id, timeout)
  2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 231, in _poll_connection
  2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher     % msg_id)
  2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher MessagingTimeout: Timed out waiting for a reply to message ID 934be80a9798443597f355d60fa08e56
  2014-09-09 18:18:13.263 13087 TRACE oslo.messaging.rpc.dispatcher 
  2014-09-09 18:18:13.274 13087 ERROR oslo.messaging._drivers.common [-] Returning exception Timed out waiting for a reply to message ID 934be80a9798443597f355d60fa08e56 to caller
  2014-09-09 18:18:13.275 13087 ERROR oslo.messaging._drivers.common [-] ['Traceback (most recent call last):\n', '  File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 134, in _dispatch_and_reply\n    incoming.message))\n', '  File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 177, in _dispatch\n    return self._do_dispatch(endpoint, method, ctxt, args)\n', '  File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 123, in _do_dispatch\n    result = getattr(endpoint, method)(ctxt, **new_args)\n', '  File "/opt/stack/nova/nova/conductor/manager.py", line 849, in unshelve_instance\n    instance)\n', '  File "/opt/stack/nova/nova/conductor/manager.py", line 816, in _schedule_instances\n    request_spec, filter_properties)\n', '  File "/opt/stack/nova/nova/scheduler/rpcapi.py", line 103, in select_destinations\n    request_spec=request_spec, filter_properties=filter_properties)\n', '  File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/client.py", line 152, in call\n    retry=self.retry)\n', '  File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/transport.py", line 90, in _send\n    timeout=timeout, retry=retry)\n', '  File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 404, in send\n    retry=retry)\n', '  File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 393, in _send\n    result = self._waiter.wait(msg_id, timeout)\n', '  File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 281, in wait\n    reply, ending = self._poll_connection(msg_id, timeout)\n', '  File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 231, in _poll_connection\n    % msg_id)\n', 'MessagingTimeout: Timed out waiting for a reply to message ID 934be80a9798443597f355d60fa08e56\n']
  ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1367186/+subscriptions
Follow ups
References