yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #93130
[Bug 2044494] [NEW] Evacuation process message queue timeout exception
Public bug reported:
Description
===========
During the process of evacuating virtual machine instances,
my message queue experienced a brief service interruption.
As a result, nova-conductor did not receive a response as
expected within the specified timeout period during
the rpc call to nova-scheduler.
However, nova-conductor failed to catch and handle the exception.
The exception was only logged but did not update the state of
the virtual machine instance to error.
Users were unable to perceive such exceptions and continued to
believe that the evacuation action was in progress.
Steps to reproduce
==================
The reasons for the interruption and timeout of the message queue service
have not yet been traced.
However, when the message queue reaches the scheduling during
the evacuation operation, this bug is certain to be reproduced.
Expected result
===============
During the evacuation process, catch MessagingTimeout and
update the status of both the virtual machine instance and
the migration record to error.
Actual result
=============
Exceptions are only output in the logs.
Users unaware of the actual issues.
Environment
===========
1. Exact version of OpenStack you are running. See the following
Openstack Nova Wallaby.
2. Which hypervisor did you use?
Libvirt + KVM
2. Which storage type did you use?
Ceph
3. Which networking type did you use?
Neutron
Logs & Configs
==============
```
2023-10-18T02:46:27.701367401+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server [req-c1780783-e8f6-455b-aacc-0933ea92a90a d21791d7ad564f1bb51a96b388e560c4 e0d2cb309d60449a857aafed38ac8816 - default default] Exception during message handling: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 00d2886fee7c40078806d881e3e410a4
2023-10-18T02:46:27.701618707+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2023-10-18T02:46:27.701628365+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 433, in get
2023-10-18T02:46:27.701634988+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server return self._queues[msg_id].get(block=True, timeout=timeout)
2023-10-18T02:46:27.701639927+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.6/site-packages/eventlet/queue.py", line 322, in get
2023-10-18T02:46:27.701671277+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server return waiter.wait()
2023-10-18T02:46:27.701676617+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.6/site-packages/eventlet/queue.py", line 141, in wait
2023-10-18T02:46:27.701681586+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server return get_hub().switch()
2023-10-18T02:46:27.701686496+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 313, in switch
2023-10-18T02:46:27.701691515+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server return self.greenlet.switch()
2023-10-18T02:46:27.701696805+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server queue.Empty
2023-10-18T02:46:27.701701223+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server
2023-10-18T02:46:27.701705071+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server During handling of the above exception, another exception occurred:
2023-10-18T02:46:27.701709088+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server
2023-10-18T02:46:27.701712896+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2023-10-18T02:46:27.701727924+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming
2023-10-18T02:46:27.701732533+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2023-10-18T02:46:27.701735989+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 309, in dispatch
2023-10-18T02:46:27.701739746+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2023-10-18T02:46:27.701743243+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 229, in _do_dispatch
2023-10-18T02:46:27.701757300+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2023-10-18T02:46:27.701761067+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.6/site-packages/nova/conductor/manager.py", line 99, in wrapper
2023-10-18T02:46:27.701764604+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server return fn(self, context, *args, **kwargs)
2023-10-18T02:46:27.701768060+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.6/site-packages/nova/conductor/manager.py", line 1196, in rebuild_instance
2023-10-18T02:46:27.701771597+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server return_alternates=False)
2023-10-18T02:46:27.701775154+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.6/site-packages/nova/conductor/manager.py", line 884, in _schedule_instances
2023-10-18T02:46:27.701778881+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server return_alternates=return_alternates)
2023-10-18T02:46:27.701782427+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.6/site-packages/nova/scheduler/client/query.py", line 42, in select_destinations
2023-10-18T02:46:27.701786164+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server instance_uuids, return_objects, return_alternates)
2023-10-18T02:46:27.701789621+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.6/site-packages/nova/scheduler/rpcapi.py", line 160, in select_destinations
2023-10-18T02:46:27.701793248+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server return cctxt.call(ctxt, 'select_destinations', **msg_args)
2023-10-18T02:46:27.701814618+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.6/site-packages/oslo_messaging/rpc/client.py", line 179, in call
2023-10-18T02:46:27.701819337+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server transport_options=self.transport_options)
2023-10-18T02:46:27.701823685+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.6/site-packages/oslo_messaging/transport.py", line 128, in _send
2023-10-18T02:46:27.701828495+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server transport_options=transport_options)
2023-10-18T02:46:27.701831831+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 682, in send
2023-10-18T02:46:27.701835628+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server transport_options=transport_options)
2023-10-18T02:46:27.701838984+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 670, in _send
2023-10-18T02:46:27.701842631+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server call_monitor_timeout)
2023-10-18T02:46:27.701845958+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 559, in wait
2023-10-18T02:46:27.701849374+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server message = self.waiters.get(msg_id, timeout=timeout)
2023-10-18T02:46:27.701852851+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 437, in get
2023-10-18T02:46:27.701856217+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server 'to message ID %s' % msg_id)
2023-10-18T02:46:27.701865885+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 00d2886fee7c40078806d881e3e410a4
2023-10-18T02:46:27.701869562+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server [00m
```
** Affects: nova
Importance: Undecided
Assignee: Haidong Pang (haidong-pang)
Status: In Progress
** Changed in: nova
Assignee: (unassigned) => Haidong Pang (haidong-pang)
** Changed in: nova
Status: New => In Progress
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2044494
Title:
Evacuation process message queue timeout exception
Status in OpenStack Compute (nova):
In Progress
Bug description:
Description
===========
During the process of evacuating virtual machine instances,
my message queue experienced a brief service interruption.
As a result, nova-conductor did not receive a response as
expected within the specified timeout period during
the rpc call to nova-scheduler.
However, nova-conductor failed to catch and handle the exception.
The exception was only logged but did not update the state of
the virtual machine instance to error.
Users were unable to perceive such exceptions and continued to
believe that the evacuation action was in progress.
Steps to reproduce
==================
The reasons for the interruption and timeout of the message queue service
have not yet been traced.
However, when the message queue reaches the scheduling during
the evacuation operation, this bug is certain to be reproduced.
Expected result
===============
During the evacuation process, catch MessagingTimeout and
update the status of both the virtual machine instance and
the migration record to error.
Actual result
=============
Exceptions are only output in the logs.
Users unaware of the actual issues.
Environment
===========
1. Exact version of OpenStack you are running. See the following
Openstack Nova Wallaby.
2. Which hypervisor did you use?
Libvirt + KVM
2. Which storage type did you use?
Ceph
3. Which networking type did you use?
Neutron
Logs & Configs
==============
```
2023-10-18T02:46:27.701367401+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server [req-c1780783-e8f6-455b-aacc-0933ea92a90a d21791d7ad564f1bb51a96b388e560c4 e0d2cb309d60449a857aafed38ac8816 - default default] Exception during message handling: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 00d2886fee7c40078806d881e3e410a4
2023-10-18T02:46:27.701618707+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2023-10-18T02:46:27.701628365+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 433, in get
2023-10-18T02:46:27.701634988+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server return self._queues[msg_id].get(block=True, timeout=timeout)
2023-10-18T02:46:27.701639927+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.6/site-packages/eventlet/queue.py", line 322, in get
2023-10-18T02:46:27.701671277+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server return waiter.wait()
2023-10-18T02:46:27.701676617+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.6/site-packages/eventlet/queue.py", line 141, in wait
2023-10-18T02:46:27.701681586+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server return get_hub().switch()
2023-10-18T02:46:27.701686496+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 313, in switch
2023-10-18T02:46:27.701691515+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server return self.greenlet.switch()
2023-10-18T02:46:27.701696805+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server queue.Empty
2023-10-18T02:46:27.701701223+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server
2023-10-18T02:46:27.701705071+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server During handling of the above exception, another exception occurred:
2023-10-18T02:46:27.701709088+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server
2023-10-18T02:46:27.701712896+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2023-10-18T02:46:27.701727924+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming
2023-10-18T02:46:27.701732533+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2023-10-18T02:46:27.701735989+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 309, in dispatch
2023-10-18T02:46:27.701739746+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2023-10-18T02:46:27.701743243+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 229, in _do_dispatch
2023-10-18T02:46:27.701757300+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2023-10-18T02:46:27.701761067+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.6/site-packages/nova/conductor/manager.py", line 99, in wrapper
2023-10-18T02:46:27.701764604+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server return fn(self, context, *args, **kwargs)
2023-10-18T02:46:27.701768060+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.6/site-packages/nova/conductor/manager.py", line 1196, in rebuild_instance
2023-10-18T02:46:27.701771597+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server return_alternates=False)
2023-10-18T02:46:27.701775154+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.6/site-packages/nova/conductor/manager.py", line 884, in _schedule_instances
2023-10-18T02:46:27.701778881+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server return_alternates=return_alternates)
2023-10-18T02:46:27.701782427+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.6/site-packages/nova/scheduler/client/query.py", line 42, in select_destinations
2023-10-18T02:46:27.701786164+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server instance_uuids, return_objects, return_alternates)
2023-10-18T02:46:27.701789621+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.6/site-packages/nova/scheduler/rpcapi.py", line 160, in select_destinations
2023-10-18T02:46:27.701793248+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server return cctxt.call(ctxt, 'select_destinations', **msg_args)
2023-10-18T02:46:27.701814618+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.6/site-packages/oslo_messaging/rpc/client.py", line 179, in call
2023-10-18T02:46:27.701819337+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server transport_options=self.transport_options)
2023-10-18T02:46:27.701823685+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.6/site-packages/oslo_messaging/transport.py", line 128, in _send
2023-10-18T02:46:27.701828495+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server transport_options=transport_options)
2023-10-18T02:46:27.701831831+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 682, in send
2023-10-18T02:46:27.701835628+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server transport_options=transport_options)
2023-10-18T02:46:27.701838984+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 670, in _send
2023-10-18T02:46:27.701842631+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server call_monitor_timeout)
2023-10-18T02:46:27.701845958+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 559, in wait
2023-10-18T02:46:27.701849374+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server message = self.waiters.get(msg_id, timeout=timeout)
2023-10-18T02:46:27.701852851+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 437, in get
2023-10-18T02:46:27.701856217+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server 'to message ID %s' % msg_id)
2023-10-18T02:46:27.701865885+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 00d2886fee7c40078806d881e3e410a4
2023-10-18T02:46:27.701869562+08:00 2023-10-18 02:46:27.698 18 ERROR oslo_messaging.rpc.server [00m
```
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2044494/+subscriptions