← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1382305] Re: conductor and compute fail to work

 

This sounds like a support request for a misconfiguration. please use
https://ask.openstack.org/en/questions/

** Changed in: nova
       Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1382305

Title:
  conductor and compute fail to work

Status in OpenStack Compute (Nova):
  Invalid

Bug description:
  Openstack works normally at start. But after a while, something wrong
  with qpid occur:

  ***********the compute log:
  2014-10-17 17:57:46.262 9250 WARNING nova.openstack.common.loopingcall [-] task <bound method DbDriver._report_state of <nova.servicegroup.drivers.db.DbDriver object at 0x2d4e2d0>> run outlasted interval by 110.00 sec
  2014-10-17 17:58:46.263 9250 ERROR nova.openstack.common.periodic_task [-] Error during ComputeManager._heal_instance_info_cache: Timed out waiting for a reply to message ID c48bd88c14fd4201bc39b0efdaaa43cc
  2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task Traceback (most recent call last):
  2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.6/site-packages/nova/openstack/common/periodic_task.py", line 198, in run_periodic_tasks
  2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task     task(self, context)
  2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 5348, in _heal_instance_info_cache
  2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task     context, self.host, expected_attrs=[], use_slave=True)
  2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.6/site-packages/nova/objects/base.py", line 153, in wrapper
  2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task     args, kwargs)
  2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.6/site-packages/nova/conductor/rpcapi.py", line 341, in object_class_action
  2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task     objver=objver, args=args, kwargs=kwargs)
  2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.6/site-packages/oslo/messaging/rpc/client.py", line 152, in call
  2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task     retry=self.retry)
  2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.6/site-packages/oslo/messaging/transport.py", line 90, in _send
  2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task     timeout=timeout, retry=retry)
  2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 404, in send
  2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task     retry=retry)
  2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 393, in _send
  2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task     result = self._waiter.wait(msg_id, timeout)
  2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 281, in wait
  2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task     reply, ending = self._poll_connection(msg_id, timeout)
  2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 231, in _poll_connection
  2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task     % msg_id)
  2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task MessagingTimeout: Timed out waiting for a reply to message ID c48bd88c14fd4201bc39b0efdaaa43cc
  2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task 
  2014-10-17 17:59:46.281 9250 WARNING nova.openstack.common.loopingcall [-] task <bound method DbDriver._report_state of <nova.servicegroup.drivers.db.DbDriver object at 0x2d4e2d0>> run outlasted interval by 110.02 sec
  2014-10-17 18:00:46.325 9250 WARNING nova.openstack.common.loopingcall [-] task <bound method DbDriver._report_state of <nova.servicegroup.drivers.db.DbDriver object at 0x2d4e2d0>> run outlasted interval by 50.04 sec
  2014-10-17 18:01:46.326 9250 ERROR nova.openstack.common.periodic_task [-] Error during ComputeManager._instance_usage_audit: Timed out waiting for a reply to message ID c5869d814e724f9086e2ede76b5e5356
  2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task Traceback (most recent call last):
  2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.6/site-packages/nova/openstack/common/periodic_task.py", line 198, in run_periodic_tasks
  2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task     task(self, context)
  2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 5555, in _instance_usage_audit
  2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task     self.host):
  2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.6/site-packages/nova/compute/utils.py", line 414, in has_audit_been_run
  2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task     begin, end, host)
  2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.6/site-packages/nova/conductor/api.py", line 184, in task_log_get
  2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task     host, state)
  2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.6/site-packages/nova/conductor/rpcapi.py", line 287, in task_log_get
  2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task     host=host, state=state)
  2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.6/site-packages/oslo/messaging/rpc/client.py", line 152, in call
  2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task     retry=self.retry)
  2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.6/site-packages/oslo/messaging/transport.py", line 90, in _send
  2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task     timeout=timeout, retry=retry)
  2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 404, in send
  2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task     retry=retry)
  2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 393, in _send
  2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task     result = self._waiter.wait(msg_id, timeout)
  2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 281, in wait
  2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task     reply, ending = self._poll_connection(msg_id, timeout)
  2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task   File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 231, in _poll_connection
  2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task     % msg_id)
  2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task MessagingTimeout: Timed out waiting for a reply to message ID c5869d814e724f9086e2ede76b5e5356
  2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task 
  2014-10-17 18:01:46.329 9250 AUDIT nova.compute.resource_tracker [-] Auditing locally available compute resources
  2014-10-17 18:01:46.353 9250 WARNING nova.virt.libvirt.driver [-] Periodic task is updating the host stat, it is trying to get disk instance-00000001, but disk file was removed by concurrent operations such as resize.

  
  **********the conductor log:
  2014-10-16 22:52:58.022 3568 INFO oslo.messaging._drivers.impl_qpid [-] Connected to AMQP server on 192.168.32.13:5672
  2014-10-16 23:19:38.841 3568 ERROR oslo.messaging._drivers.impl_qpid [-] Failed to publish message to topic 'reply_e1264fdf8ffc48e78683a76bd744167a': heartbeat timeout
  2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid Traceback (most recent call last):
  2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid   File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/impl_qpid.py", line 582, in ensure
  2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid     return method()
  2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid   File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/impl_qpid.py", line 654, in _publisher_send
  2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid     publisher = cls(self.conf, self.session, topic=topic, **kwargs)
  2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid   File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/impl_qpid.py", line 392, in __init__
  2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid     node_opts)
  2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid   File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/impl_qpid.py", line 339, in __init__
  2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid     self.reconnect(session)
  2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid   File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/impl_qpid.py", line 343, in reconnect
  2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid     self.sender = session.sender(self.address)
  2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid   File "<string>", line 6, in sender
  2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 622, in sender
  2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid     sender._ewait(lambda: sender.linked)
  2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 833, in _ewait
  2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid     result = self.session._ewait(lambda: self.error or predicate(), timeout)
  2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 596, in _ewait
  2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid     result = self.connection._ewait(lambda: self.error or predicate(), timeout)
  2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 235, in _ewait
  2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid     self.check_error()
  2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 228, in check_error
  2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid     raise e
  2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid HeartbeatTimeout: heartbeat timeout
  2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid 
  2014-10-16 23:19:38.906 3568 INFO oslo.messaging._drivers.impl_qpid [-] Connected to AMQP server on 192.168.32.13:5672
  2014-10-16 23:42:42.487 3568 ERROR oslo.messaging._drivers.impl_qpid [-] Failed to publish message to topic 'reply_ca6684a039ce40f58cf80e2ab1302243': heartbeat timeout
  2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid Traceback (most recent call last):
  2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid   File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/impl_qpid.py", line 582, in ensure
  2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid     return method()
  2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid   File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/impl_qpid.py", line 654, in _publisher_send
  2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid     publisher = cls(self.conf, self.session, topic=topic, **kwargs)
  2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid   File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/impl_qpid.py", line 392, in __init__
  2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid     node_opts)
  2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid   File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/impl_qpid.py", line 339, in __init__
  2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid     self.reconnect(session)
  2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid   File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/impl_qpid.py", line 343, in reconnect
  2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid     self.sender = session.sender(self.address)
  2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid   File "<string>", line 6, in sender
  2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 622, in sender
  2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid     sender._ewait(lambda: sender.linked)
  2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 833, in _ewait
  2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid     result = self.session._ewait(lambda: self.error or predicate(), timeout)
  2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 596, in _ewait
  2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid     result = self.connection._ewait(lambda: self.error or predicate(), timeout)
  2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 235, in _ewait
  2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid     self.check_error()
  2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 228, in check_error
  2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid     raise e
  2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid HeartbeatTimeout: heartbeat timeout
  2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid 
  [root@lhg-master-xrswnvoef7mu ~]# 

  
  *************

  [root@lhg-master-xrswnvoef7mu ~]# ps -ef|grep nova-con
  nova      3568     1 41 Oct16 ?        08:33:12 /usr/bin/python /usr/bin/nova-conductor --config-file /etc/nova/nova.conf --logfile /var/log/nova/conductor.log
  root      8268  6729  0 18:04 pts/0    00:00:00 grep nova-con
  [root@lhg-master-xrswnvoef7mu ~]# nova service-list
  +----+------------------------------------+-------------------------+----------+---------+-------+----------------------------+-----------------+
  | Id | Binary                             | Host                    | Zone     | Status  | State | Updated_at                 | Disabled Reason |
  +----+------------------------------------+-------------------------+----------+---------+-------+----------------------------+-----------------+
  | 1  | nova-cert                          | lhg-master-xrswnvoef7mu | internal | enabled | up    | 2014-10-17T10:04:20.000000 | -               |
  | 2  | nova-conductor                     | lhg-master-xrswnvoef7mu | internal | enabled | down  | 2014-10-16T15:42:33.000000 | -               |
  | 3  | nova-network                       | lhg-master-xrswnvoef7mu | internal | enabled | down  | 2014-10-16T15:42:36.000000 | -               |
  | 4  | nova-ibm-notification              | lhg-master-xrswnvoef7mu | internal | enabled | up    | 2014-10-17T10:04:24.000000 | -               |
  | 5  | nova-scheduler                     | lhg-master-xrswnvoef7mu | internal | enabled | up    | 2014-10-17T10:04:20.000000 | -               |
  | 6  | nova-ibm-ego-ha-service            | lhg-master-xrswnvoef7mu | internal | enabled | up    | 2014-10-17T10:04:20.000000 | -               |
  | 7  | nova-ibm-ego-resource-optimization | lhg-master-xrswnvoef7mu | internal | enabled | down  | 2014-10-16T14:10:19.000000 | -               |
  | 8  | nova-network                       | lhg-node1-tedtmwaqrfgk  | internal | enabled | down  | 2014-10-16T15:42:39.000000 | -               |
  | 9  | nova-compute                       | lhg-node1-tedtmwaqrfgk  | nova     | enabled | down  | 2014-10-16T15:42:40.000000 | -               |
  | 10 | nova-network                       | lhg-node2-dvflznteywku  | internal | enabled | down  | 2014-10-16T15:42:38.000000 | -               |
  | 11 | nova-compute                       | lhg-node2-dvflznteywku  | nova     | enabled | down  | 2014-10-16T15:42:37.000000 | -               |
  | 12 | nova-network                       | lhg-node3-osmyfhgfshd4  | internal | enabled | down  | 2014-10-16T15:42:39.000000 | -               |
  | 13 | nova-compute                       | lhg-node3-osmyfhgfshd4  | nova     | enabled | down  | 2014-10-16T15:42:42.000000 | -               |
  +----+------------------------------------+-------------------------+----------+---------+-------+----------------------------+-----------------+
  [root@lhg-master-xrswnvoef7mu ~]# service qpidd status
  qpidd (pid 27146) is running...

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1382305/+subscriptions


References