yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #23887
[Bug 1382305] Re: conductor and compute fail to work
This sounds like a support request for a misconfiguration. please use
https://ask.openstack.org/en/questions/
** Changed in: nova
Status: New => Invalid
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1382305
Title:
conductor and compute fail to work
Status in OpenStack Compute (Nova):
Invalid
Bug description:
Openstack works normally at start. But after a while, something wrong
with qpid occur:
***********the compute log:
2014-10-17 17:57:46.262 9250 WARNING nova.openstack.common.loopingcall [-] task <bound method DbDriver._report_state of <nova.servicegroup.drivers.db.DbDriver object at 0x2d4e2d0>> run outlasted interval by 110.00 sec
2014-10-17 17:58:46.263 9250 ERROR nova.openstack.common.periodic_task [-] Error during ComputeManager._heal_instance_info_cache: Timed out waiting for a reply to message ID c48bd88c14fd4201bc39b0efdaaa43cc
2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task Traceback (most recent call last):
2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.6/site-packages/nova/openstack/common/periodic_task.py", line 198, in run_periodic_tasks
2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task task(self, context)
2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 5348, in _heal_instance_info_cache
2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task context, self.host, expected_attrs=[], use_slave=True)
2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.6/site-packages/nova/objects/base.py", line 153, in wrapper
2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task args, kwargs)
2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.6/site-packages/nova/conductor/rpcapi.py", line 341, in object_class_action
2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task objver=objver, args=args, kwargs=kwargs)
2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.6/site-packages/oslo/messaging/rpc/client.py", line 152, in call
2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task retry=self.retry)
2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.6/site-packages/oslo/messaging/transport.py", line 90, in _send
2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task timeout=timeout, retry=retry)
2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 404, in send
2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task retry=retry)
2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 393, in _send
2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task result = self._waiter.wait(msg_id, timeout)
2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 281, in wait
2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task reply, ending = self._poll_connection(msg_id, timeout)
2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 231, in _poll_connection
2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task % msg_id)
2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task MessagingTimeout: Timed out waiting for a reply to message ID c48bd88c14fd4201bc39b0efdaaa43cc
2014-10-17 17:58:46.263 9250 TRACE nova.openstack.common.periodic_task
2014-10-17 17:59:46.281 9250 WARNING nova.openstack.common.loopingcall [-] task <bound method DbDriver._report_state of <nova.servicegroup.drivers.db.DbDriver object at 0x2d4e2d0>> run outlasted interval by 110.02 sec
2014-10-17 18:00:46.325 9250 WARNING nova.openstack.common.loopingcall [-] task <bound method DbDriver._report_state of <nova.servicegroup.drivers.db.DbDriver object at 0x2d4e2d0>> run outlasted interval by 50.04 sec
2014-10-17 18:01:46.326 9250 ERROR nova.openstack.common.periodic_task [-] Error during ComputeManager._instance_usage_audit: Timed out waiting for a reply to message ID c5869d814e724f9086e2ede76b5e5356
2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task Traceback (most recent call last):
2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.6/site-packages/nova/openstack/common/periodic_task.py", line 198, in run_periodic_tasks
2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task task(self, context)
2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 5555, in _instance_usage_audit
2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task self.host):
2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.6/site-packages/nova/compute/utils.py", line 414, in has_audit_been_run
2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task begin, end, host)
2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.6/site-packages/nova/conductor/api.py", line 184, in task_log_get
2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task host, state)
2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.6/site-packages/nova/conductor/rpcapi.py", line 287, in task_log_get
2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task host=host, state=state)
2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.6/site-packages/oslo/messaging/rpc/client.py", line 152, in call
2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task retry=self.retry)
2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.6/site-packages/oslo/messaging/transport.py", line 90, in _send
2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task timeout=timeout, retry=retry)
2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 404, in send
2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task retry=retry)
2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 393, in _send
2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task result = self._waiter.wait(msg_id, timeout)
2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 281, in wait
2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task reply, ending = self._poll_connection(msg_id, timeout)
2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 231, in _poll_connection
2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task % msg_id)
2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task MessagingTimeout: Timed out waiting for a reply to message ID c5869d814e724f9086e2ede76b5e5356
2014-10-17 18:01:46.326 9250 TRACE nova.openstack.common.periodic_task
2014-10-17 18:01:46.329 9250 AUDIT nova.compute.resource_tracker [-] Auditing locally available compute resources
2014-10-17 18:01:46.353 9250 WARNING nova.virt.libvirt.driver [-] Periodic task is updating the host stat, it is trying to get disk instance-00000001, but disk file was removed by concurrent operations such as resize.
**********the conductor log:
2014-10-16 22:52:58.022 3568 INFO oslo.messaging._drivers.impl_qpid [-] Connected to AMQP server on 192.168.32.13:5672
2014-10-16 23:19:38.841 3568 ERROR oslo.messaging._drivers.impl_qpid [-] Failed to publish message to topic 'reply_e1264fdf8ffc48e78683a76bd744167a': heartbeat timeout
2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid Traceback (most recent call last):
2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/impl_qpid.py", line 582, in ensure
2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid return method()
2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/impl_qpid.py", line 654, in _publisher_send
2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid publisher = cls(self.conf, self.session, topic=topic, **kwargs)
2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/impl_qpid.py", line 392, in __init__
2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid node_opts)
2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/impl_qpid.py", line 339, in __init__
2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid self.reconnect(session)
2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/impl_qpid.py", line 343, in reconnect
2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid self.sender = session.sender(self.address)
2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid File "<string>", line 6, in sender
2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 622, in sender
2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid sender._ewait(lambda: sender.linked)
2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 833, in _ewait
2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid result = self.session._ewait(lambda: self.error or predicate(), timeout)
2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 596, in _ewait
2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid result = self.connection._ewait(lambda: self.error or predicate(), timeout)
2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 235, in _ewait
2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid self.check_error()
2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 228, in check_error
2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid raise e
2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid HeartbeatTimeout: heartbeat timeout
2014-10-16 23:19:38.841 3568 TRACE oslo.messaging._drivers.impl_qpid
2014-10-16 23:19:38.906 3568 INFO oslo.messaging._drivers.impl_qpid [-] Connected to AMQP server on 192.168.32.13:5672
2014-10-16 23:42:42.487 3568 ERROR oslo.messaging._drivers.impl_qpid [-] Failed to publish message to topic 'reply_ca6684a039ce40f58cf80e2ab1302243': heartbeat timeout
2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid Traceback (most recent call last):
2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/impl_qpid.py", line 582, in ensure
2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid return method()
2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/impl_qpid.py", line 654, in _publisher_send
2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid publisher = cls(self.conf, self.session, topic=topic, **kwargs)
2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/impl_qpid.py", line 392, in __init__
2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid node_opts)
2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/impl_qpid.py", line 339, in __init__
2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid self.reconnect(session)
2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid File "/usr/lib/python2.6/site-packages/oslo/messaging/_drivers/impl_qpid.py", line 343, in reconnect
2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid self.sender = session.sender(self.address)
2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid File "<string>", line 6, in sender
2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 622, in sender
2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid sender._ewait(lambda: sender.linked)
2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 833, in _ewait
2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid result = self.session._ewait(lambda: self.error or predicate(), timeout)
2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 596, in _ewait
2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid result = self.connection._ewait(lambda: self.error or predicate(), timeout)
2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 235, in _ewait
2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid self.check_error()
2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 228, in check_error
2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid raise e
2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid HeartbeatTimeout: heartbeat timeout
2014-10-16 23:42:42.487 3568 TRACE oslo.messaging._drivers.impl_qpid
[root@lhg-master-xrswnvoef7mu ~]#
*************
[root@lhg-master-xrswnvoef7mu ~]# ps -ef|grep nova-con
nova 3568 1 41 Oct16 ? 08:33:12 /usr/bin/python /usr/bin/nova-conductor --config-file /etc/nova/nova.conf --logfile /var/log/nova/conductor.log
root 8268 6729 0 18:04 pts/0 00:00:00 grep nova-con
[root@lhg-master-xrswnvoef7mu ~]# nova service-list
+----+------------------------------------+-------------------------+----------+---------+-------+----------------------------+-----------------+
| Id | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason |
+----+------------------------------------+-------------------------+----------+---------+-------+----------------------------+-----------------+
| 1 | nova-cert | lhg-master-xrswnvoef7mu | internal | enabled | up | 2014-10-17T10:04:20.000000 | - |
| 2 | nova-conductor | lhg-master-xrswnvoef7mu | internal | enabled | down | 2014-10-16T15:42:33.000000 | - |
| 3 | nova-network | lhg-master-xrswnvoef7mu | internal | enabled | down | 2014-10-16T15:42:36.000000 | - |
| 4 | nova-ibm-notification | lhg-master-xrswnvoef7mu | internal | enabled | up | 2014-10-17T10:04:24.000000 | - |
| 5 | nova-scheduler | lhg-master-xrswnvoef7mu | internal | enabled | up | 2014-10-17T10:04:20.000000 | - |
| 6 | nova-ibm-ego-ha-service | lhg-master-xrswnvoef7mu | internal | enabled | up | 2014-10-17T10:04:20.000000 | - |
| 7 | nova-ibm-ego-resource-optimization | lhg-master-xrswnvoef7mu | internal | enabled | down | 2014-10-16T14:10:19.000000 | - |
| 8 | nova-network | lhg-node1-tedtmwaqrfgk | internal | enabled | down | 2014-10-16T15:42:39.000000 | - |
| 9 | nova-compute | lhg-node1-tedtmwaqrfgk | nova | enabled | down | 2014-10-16T15:42:40.000000 | - |
| 10 | nova-network | lhg-node2-dvflznteywku | internal | enabled | down | 2014-10-16T15:42:38.000000 | - |
| 11 | nova-compute | lhg-node2-dvflznteywku | nova | enabled | down | 2014-10-16T15:42:37.000000 | - |
| 12 | nova-network | lhg-node3-osmyfhgfshd4 | internal | enabled | down | 2014-10-16T15:42:39.000000 | - |
| 13 | nova-compute | lhg-node3-osmyfhgfshd4 | nova | enabled | down | 2014-10-16T15:42:42.000000 | - |
+----+------------------------------------+-------------------------+----------+---------+-------+----------------------------+-----------------+
[root@lhg-master-xrswnvoef7mu ~]# service qpidd status
qpidd (pid 27146) is running...
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1382305/+subscriptions
References