← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2044494] [NEW] Evacuation process message queue timeout exception

 

Public bug reported:

Description
===========
During the process of evacuating virtual machine instances, 
my message queue experienced a brief service interruption.

As a result, nova-conductor did not receive a response as 
expected within the specified timeout period during
the rpc call to nova-scheduler.

However, nova-conductor failed to catch and handle the exception.
The exception was only logged but did not update the state of
the virtual machine instance to error.

Users were unable to perceive such exceptions and continued to
believe that the evacuation action was in progress.

Steps to reproduce
==================
The reasons for the interruption and timeout of the message queue service
have not yet been traced. 

However, when the message queue reaches the scheduling during
the evacuation operation, this bug is certain to be reproduced.

Expected result
===============
During the evacuation process, catch MessagingTimeout and
update the status of both the virtual machine instance and
the migration record to error.

Actual result
=============
Exceptions are only output in the logs.
Users unaware of the actual issues.

Environment
===========
1. Exact version of OpenStack you are running. See the following
   Openstack Nova Wallaby.

2. Which hypervisor did you use?
   Libvirt + KVM

2. Which storage type did you use?
   Ceph

3. Which networking type did you use?
   Neutron

Logs & Configs
==============


```
2023-10-18T02:46:27.701367401+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server [req-c1780783-e8f6-455b-aacc-0933ea92a90a d21791d7ad564f1bb51a96b388e560c4 e0d2cb309d60449a857aafed38ac8816 - default default] Exception during message handling: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 00d2886fee7c40078806d881e3e410a4
2023-10-18T02:46:27.701618707+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2023-10-18T02:46:27.701628365+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 433, in get
2023-10-18T02:46:27.701634988+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server     return self._queues[msg_id].get(block=True, timeout=timeout)
2023-10-18T02:46:27.701639927+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.6/site-packages/eventlet/queue.py", line 322, in get
2023-10-18T02:46:27.701671277+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server     return waiter.wait()
2023-10-18T02:46:27.701676617+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.6/site-packages/eventlet/queue.py", line 141, in wait
2023-10-18T02:46:27.701681586+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server     return get_hub().switch()
2023-10-18T02:46:27.701686496+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 313, in switch
2023-10-18T02:46:27.701691515+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server     return self.greenlet.switch()
2023-10-18T02:46:27.701696805+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server queue.Empty
2023-10-18T02:46:27.701701223+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server 
2023-10-18T02:46:27.701705071+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server During handling of the above exception, another exception occurred:
2023-10-18T02:46:27.701709088+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server 
2023-10-18T02:46:27.701712896+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2023-10-18T02:46:27.701727924+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming
2023-10-18T02:46:27.701732533+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server     res = self.dispatcher.dispatch(message)
2023-10-18T02:46:27.701735989+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 309, in dispatch
2023-10-18T02:46:27.701739746+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server     return self._do_dispatch(endpoint, method, ctxt, args)
2023-10-18T02:46:27.701743243+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 229, in _do_dispatch
2023-10-18T02:46:27.701757300+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server     result = func(ctxt, **new_args)
2023-10-18T02:46:27.701761067+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.6/site-packages/nova/conductor/manager.py", line 99, in wrapper
2023-10-18T02:46:27.701764604+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server     return fn(self, context, *args, **kwargs)
2023-10-18T02:46:27.701768060+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.6/site-packages/nova/conductor/manager.py", line 1196, in rebuild_instance
2023-10-18T02:46:27.701771597+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server     return_alternates=False)
2023-10-18T02:46:27.701775154+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.6/site-packages/nova/conductor/manager.py", line 884, in _schedule_instances
2023-10-18T02:46:27.701778881+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server     return_alternates=return_alternates)
2023-10-18T02:46:27.701782427+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.6/site-packages/nova/scheduler/client/query.py", line 42, in select_destinations
2023-10-18T02:46:27.701786164+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server     instance_uuids, return_objects, return_alternates)
2023-10-18T02:46:27.701789621+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.6/site-packages/nova/scheduler/rpcapi.py", line 160, in select_destinations
2023-10-18T02:46:27.701793248+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server     return cctxt.call(ctxt, 'select_destinations', **msg_args)
2023-10-18T02:46:27.701814618+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.6/site-packages/oslo_messaging/rpc/client.py", line 179, in call
2023-10-18T02:46:27.701819337+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server     transport_options=self.transport_options)
2023-10-18T02:46:27.701823685+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.6/site-packages/oslo_messaging/transport.py", line 128, in _send
2023-10-18T02:46:27.701828495+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server     transport_options=transport_options)
2023-10-18T02:46:27.701831831+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 682, in send
2023-10-18T02:46:27.701835628+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server     transport_options=transport_options)
2023-10-18T02:46:27.701838984+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 670, in _send
2023-10-18T02:46:27.701842631+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server     call_monitor_timeout)
2023-10-18T02:46:27.701845958+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 559, in wait
2023-10-18T02:46:27.701849374+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server     message = self.waiters.get(msg_id, timeout=timeout)
2023-10-18T02:46:27.701852851+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 437, in get
2023-10-18T02:46:27.701856217+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server     'to message ID %s' % msg_id)
2023-10-18T02:46:27.701865885+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 00d2886fee7c40078806d881e3e410a4
2023-10-18T02:46:27.701869562+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server 
```

** Affects: nova
     Importance: Undecided
     Assignee: Haidong Pang (haidong-pang)
         Status: In Progress

** Changed in: nova
     Assignee: (unassigned) => Haidong Pang (haidong-pang)

** Changed in: nova
       Status: New => In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2044494

Title:
  Evacuation process message queue timeout exception

Status in OpenStack Compute (nova):
  In Progress

Bug description:
  Description
  ===========
  During the process of evacuating virtual machine instances, 
  my message queue experienced a brief service interruption.

  As a result, nova-conductor did not receive a response as 
  expected within the specified timeout period during
  the rpc call to nova-scheduler.

  However, nova-conductor failed to catch and handle the exception.
  The exception was only logged but did not update the state of
  the virtual machine instance to error.

  Users were unable to perceive such exceptions and continued to
  believe that the evacuation action was in progress.

  Steps to reproduce
  ==================
  The reasons for the interruption and timeout of the message queue service
  have not yet been traced. 

  However, when the message queue reaches the scheduling during
  the evacuation operation, this bug is certain to be reproduced.

  Expected result
  ===============
  During the evacuation process, catch MessagingTimeout and
  update the status of both the virtual machine instance and
  the migration record to error.

  Actual result
  =============
  Exceptions are only output in the logs.
  Users unaware of the actual issues.

  Environment
  ===========
  1. Exact version of OpenStack you are running. See the following
     Openstack Nova Wallaby.

  2. Which hypervisor did you use?
     Libvirt + KVM

  2. Which storage type did you use?
     Ceph

  3. Which networking type did you use?
     Neutron

  Logs & Configs
  ==============

  
  ```
  2023-10-18T02:46:27.701367401+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server [req-c1780783-e8f6-455b-aacc-0933ea92a90a d21791d7ad564f1bb51a96b388e560c4 e0d2cb309d60449a857aafed38ac8816 - default default] Exception during message handling: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 00d2886fee7c40078806d881e3e410a4
  2023-10-18T02:46:27.701618707+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
  2023-10-18T02:46:27.701628365+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 433, in get
  2023-10-18T02:46:27.701634988+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server     return self._queues[msg_id].get(block=True, timeout=timeout)
  2023-10-18T02:46:27.701639927+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.6/site-packages/eventlet/queue.py", line 322, in get
  2023-10-18T02:46:27.701671277+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server     return waiter.wait()
  2023-10-18T02:46:27.701676617+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.6/site-packages/eventlet/queue.py", line 141, in wait
  2023-10-18T02:46:27.701681586+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server     return get_hub().switch()
  2023-10-18T02:46:27.701686496+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 313, in switch
  2023-10-18T02:46:27.701691515+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server     return self.greenlet.switch()
  2023-10-18T02:46:27.701696805+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server queue.Empty
  2023-10-18T02:46:27.701701223+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server 
  2023-10-18T02:46:27.701705071+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server During handling of the above exception, another exception occurred:
  2023-10-18T02:46:27.701709088+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server 
  2023-10-18T02:46:27.701712896+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
  2023-10-18T02:46:27.701727924+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming
  2023-10-18T02:46:27.701732533+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server     res = self.dispatcher.dispatch(message)
  2023-10-18T02:46:27.701735989+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 309, in dispatch
  2023-10-18T02:46:27.701739746+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server     return self._do_dispatch(endpoint, method, ctxt, args)
  2023-10-18T02:46:27.701743243+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 229, in _do_dispatch
  2023-10-18T02:46:27.701757300+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server     result = func(ctxt, **new_args)
  2023-10-18T02:46:27.701761067+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.6/site-packages/nova/conductor/manager.py", line 99, in wrapper
  2023-10-18T02:46:27.701764604+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server     return fn(self, context, *args, **kwargs)
  2023-10-18T02:46:27.701768060+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.6/site-packages/nova/conductor/manager.py", line 1196, in rebuild_instance
  2023-10-18T02:46:27.701771597+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server     return_alternates=False)
  2023-10-18T02:46:27.701775154+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.6/site-packages/nova/conductor/manager.py", line 884, in _schedule_instances
  2023-10-18T02:46:27.701778881+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server     return_alternates=return_alternates)
  2023-10-18T02:46:27.701782427+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.6/site-packages/nova/scheduler/client/query.py", line 42, in select_destinations
  2023-10-18T02:46:27.701786164+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server     instance_uuids, return_objects, return_alternates)
  2023-10-18T02:46:27.701789621+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.6/site-packages/nova/scheduler/rpcapi.py", line 160, in select_destinations
  2023-10-18T02:46:27.701793248+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server     return cctxt.call(ctxt, 'select_destinations', **msg_args)
  2023-10-18T02:46:27.701814618+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.6/site-packages/oslo_messaging/rpc/client.py", line 179, in call
  2023-10-18T02:46:27.701819337+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server     transport_options=self.transport_options)
  2023-10-18T02:46:27.701823685+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.6/site-packages/oslo_messaging/transport.py", line 128, in _send
  2023-10-18T02:46:27.701828495+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server     transport_options=transport_options)
  2023-10-18T02:46:27.701831831+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 682, in send
  2023-10-18T02:46:27.701835628+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server     transport_options=transport_options)
  2023-10-18T02:46:27.701838984+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 670, in _send
  2023-10-18T02:46:27.701842631+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server     call_monitor_timeout)
  2023-10-18T02:46:27.701845958+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 559, in wait
  2023-10-18T02:46:27.701849374+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server     message = self.waiters.get(msg_id, timeout=timeout)
  2023-10-18T02:46:27.701852851+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 437, in get
  2023-10-18T02:46:27.701856217+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server     'to message ID %s' % msg_id)
  2023-10-18T02:46:27.701865885+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 00d2886fee7c40078806d881e3e410a4
  2023-10-18T02:46:27.701869562+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server 
  ```

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2044494/+subscriptions