← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1547544] [NEW] heat: MessagingTimeout: Timed out waiting for a reply to message ID

 

Public bug reported:

Setup:

Single controller[48 GB RAM, 16vCPU, 120GB Disk]
3 Network Nodes
100 ESX hypervisors distributed in 10 nova-compute nodes

Test:

1. Create /16 network
2. Heat template which which will launch 100 instances on network created step 1
3. Create 10 stack back2back so that we reach 1000 instances without waiting for previous stack to complete

Observation:

stack creations are failing while nova run_periodic_tasks at different
places like _heal_instance_info_cache,  _sync_scheduler_instance_info,
_update_available_resource etc

Have attached sample heat template, heat logs, nova compute log from one
of the host.


Logs:

2016-02-19 04:21:54.691 TRACE nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 271, in inner
2016-02-19 04:21:54.691 TRACE nova.compute.manager     return f(*args, **kwargs)
2016-02-19 04:21:54.691 TRACE nova.compute.manager   File "/opt/stack/nova/nova/compute/resource_tracker.py", line 553, in _update_available_resource
2016-02-19 04:21:54.691 TRACE nova.compute.manager     context, self.host, self.nodename)
2016-02-19 04:21:54.691 TRACE nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 174, in wrapper
2016-02-19 04:21:54.691 TRACE nova.compute.manager     args, kwargs)
2016-02-19 04:21:54.691 TRACE nova.compute.manager   File "/opt/stack/nova/nova/conductor/rpcapi.py", line 240, in object_class_action_versions
2016-02-19 04:21:54.691 TRACE nova.compute.manager     args=args, kwargs=kwargs)
2016-02-19 04:21:54.691 TRACE nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 158, in call
2016-02-19 04:21:54.691 TRACE nova.compute.manager     retry=self.retry)
2016-02-19 04:21:54.691 TRACE nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 90, in _send
2016-02-19 04:21:54.691 TRACE nova.compute.manager     timeout=timeout, retry=retry)
2016-02-19 04:21:54.691 TRACE nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 465, in send
2016-02-19 04:21:54.691 TRACE nova.compute.manager     retry=retry)
2016-02-19 04:21:54.691 TRACE nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 454, in _send
2016-02-19 04:21:54.691 TRACE nova.compute.manager     result = self._waiter.wait(msg_id, timeout)
2016-02-19 04:21:54.691 TRACE nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 337, in wait
2016-02-19 04:21:54.691 TRACE nova.compute.manager     message = self.waiters.get(msg_id, timeout=timeout)
2016-02-19 04:21:54.691 TRACE nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 239, in get
2016-02-19 04:21:54.691 TRACE nova.compute.manager     'to message ID %s' % msg_id)
2016-02-19 04:21:54.691 TRACE nova.compute.manager MessagingTimeout: Timed out waiting for a reply to message ID a87a7f358a0948efa3ab5beb0c8f45e7
--


stack@esx-compute-9:/opt/stack/nova$ git log -1
commit d51c5670d8d26e989d92eb29658eed8113034c0f
Merge: 4fade90 30d5d80
Author: Jenkins <jenkins@xxxxxxxxxxxxxxxxxxxx>
Date:   Thu Feb 18 17:56:32 2016 +0000

    Merge "reset task_state after select_destinations failed."
stack@esx-compute-9:/opt/stack/nova$

** Affects: nova
     Importance: Undecided
         Status: New


** Tags: vmware

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1547544

Title:
  heat: MessagingTimeout: Timed out waiting for a reply to message ID

Status in OpenStack Compute (nova):
  New

Bug description:
  Setup:

  Single controller[48 GB RAM, 16vCPU, 120GB Disk]
  3 Network Nodes
  100 ESX hypervisors distributed in 10 nova-compute nodes

  Test:

  1. Create /16 network
  2. Heat template which which will launch 100 instances on network created step 1
  3. Create 10 stack back2back so that we reach 1000 instances without waiting for previous stack to complete

  Observation:

  stack creations are failing while nova run_periodic_tasks at different
  places like _heal_instance_info_cache,  _sync_scheduler_instance_info,
  _update_available_resource etc

  Have attached sample heat template, heat logs, nova compute log from
  one of the host.

  
  Logs:

  2016-02-19 04:21:54.691 TRACE nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 271, in inner
  2016-02-19 04:21:54.691 TRACE nova.compute.manager     return f(*args, **kwargs)
  2016-02-19 04:21:54.691 TRACE nova.compute.manager   File "/opt/stack/nova/nova/compute/resource_tracker.py", line 553, in _update_available_resource
  2016-02-19 04:21:54.691 TRACE nova.compute.manager     context, self.host, self.nodename)
  2016-02-19 04:21:54.691 TRACE nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 174, in wrapper
  2016-02-19 04:21:54.691 TRACE nova.compute.manager     args, kwargs)
  2016-02-19 04:21:54.691 TRACE nova.compute.manager   File "/opt/stack/nova/nova/conductor/rpcapi.py", line 240, in object_class_action_versions
  2016-02-19 04:21:54.691 TRACE nova.compute.manager     args=args, kwargs=kwargs)
  2016-02-19 04:21:54.691 TRACE nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 158, in call
  2016-02-19 04:21:54.691 TRACE nova.compute.manager     retry=self.retry)
  2016-02-19 04:21:54.691 TRACE nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 90, in _send
  2016-02-19 04:21:54.691 TRACE nova.compute.manager     timeout=timeout, retry=retry)
  2016-02-19 04:21:54.691 TRACE nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 465, in send
  2016-02-19 04:21:54.691 TRACE nova.compute.manager     retry=retry)
  2016-02-19 04:21:54.691 TRACE nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 454, in _send
  2016-02-19 04:21:54.691 TRACE nova.compute.manager     result = self._waiter.wait(msg_id, timeout)
  2016-02-19 04:21:54.691 TRACE nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 337, in wait
  2016-02-19 04:21:54.691 TRACE nova.compute.manager     message = self.waiters.get(msg_id, timeout=timeout)
  2016-02-19 04:21:54.691 TRACE nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 239, in get
  2016-02-19 04:21:54.691 TRACE nova.compute.manager     'to message ID %s' % msg_id)
  2016-02-19 04:21:54.691 TRACE nova.compute.manager MessagingTimeout: Timed out waiting for a reply to message ID a87a7f358a0948efa3ab5beb0c8f45e7
  --

  
  stack@esx-compute-9:/opt/stack/nova$ git log -1
  commit d51c5670d8d26e989d92eb29658eed8113034c0f
  Merge: 4fade90 30d5d80
  Author: Jenkins <jenkins@xxxxxxxxxxxxxxxxxxxx>
  Date:   Thu Feb 18 17:56:32 2016 +0000

      Merge "reset task_state after select_destinations failed."
  stack@esx-compute-9:/opt/stack/nova$

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1547544/+subscriptions


Follow ups