yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #46737
[Bug 1547544] Re: heat: MessagingTimeout: Timed out waiting for a reply to message ID
>From looking at the dstat output, the node in question is above load avg
of 11 for nearly 2 hours, about an hour into it is where your error
happens.
Realistically, that's just too much work being asked of the node. We
have found in the gate that once you get sustained load average over 10
things start to break down. There is no bug fix for this, it's just a
fallout of our architecture.
Marking as won't fix, as I don't think there is anything actionable
here. If you have performance improvements in your environment that make
this better, that's great. However there are bounds in which the nova
compute worker just does fail over, and there is not much to be done
about it.
** Changed in: nova
Status: New => Won't Fix
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1547544
Title:
heat: MessagingTimeout: Timed out waiting for a reply to message ID
Status in OpenStack Compute (nova):
Won't Fix
Status in oslo.messaging:
New
Bug description:
Setup:
Single controller[48 GB RAM, 16vCPU, 120GB Disk]
3 Network Nodes
100 ESX hypervisors distributed in 10 nova-compute nodes
Test:
1. Create /16 network
2. Heat template which which will launch 100 instances on network created step 1
3. Create 10 stack back2back so that we reach 1000 instances without waiting for previous stack to complete
Observation:
stack creations are failing while nova run_periodic_tasks at different
places like _heal_instance_info_cache, _sync_scheduler_instance_info,
_update_available_resource etc
Have attached sample heat template, heat logs, nova compute log from
one of the host.
Logs:
2016-02-19 04:21:54.691 TRACE nova.compute.manager File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 271, in inner
2016-02-19 04:21:54.691 TRACE nova.compute.manager return f(*args, **kwargs)
2016-02-19 04:21:54.691 TRACE nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 553, in _update_available_resource
2016-02-19 04:21:54.691 TRACE nova.compute.manager context, self.host, self.nodename)
2016-02-19 04:21:54.691 TRACE nova.compute.manager File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 174, in wrapper
2016-02-19 04:21:54.691 TRACE nova.compute.manager args, kwargs)
2016-02-19 04:21:54.691 TRACE nova.compute.manager File "/opt/stack/nova/nova/conductor/rpcapi.py", line 240, in object_class_action_versions
2016-02-19 04:21:54.691 TRACE nova.compute.manager args=args, kwargs=kwargs)
2016-02-19 04:21:54.691 TRACE nova.compute.manager File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 158, in call
2016-02-19 04:21:54.691 TRACE nova.compute.manager retry=self.retry)
2016-02-19 04:21:54.691 TRACE nova.compute.manager File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 90, in _send
2016-02-19 04:21:54.691 TRACE nova.compute.manager timeout=timeout, retry=retry)
2016-02-19 04:21:54.691 TRACE nova.compute.manager File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 465, in send
2016-02-19 04:21:54.691 TRACE nova.compute.manager retry=retry)
2016-02-19 04:21:54.691 TRACE nova.compute.manager File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 454, in _send
2016-02-19 04:21:54.691 TRACE nova.compute.manager result = self._waiter.wait(msg_id, timeout)
2016-02-19 04:21:54.691 TRACE nova.compute.manager File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 337, in wait
2016-02-19 04:21:54.691 TRACE nova.compute.manager message = self.waiters.get(msg_id, timeout=timeout)
2016-02-19 04:21:54.691 TRACE nova.compute.manager File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 239, in get
2016-02-19 04:21:54.691 TRACE nova.compute.manager 'to message ID %s' % msg_id)
2016-02-19 04:21:54.691 TRACE nova.compute.manager MessagingTimeout: Timed out waiting for a reply to message ID a87a7f358a0948efa3ab5beb0c8f45e7
--
stack@esx-compute-9:/opt/stack/nova$ git log -1
commit d51c5670d8d26e989d92eb29658eed8113034c0f
Merge: 4fade90 30d5d80
Author: Jenkins <jenkins@xxxxxxxxxxxxxxxxxxxx>
Date: Thu Feb 18 17:56:32 2016 +0000
Merge "reset task_state after select_destinations failed."
stack@esx-compute-9:/opt/stack/nova$
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1547544/+subscriptions
References