yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #05967
[Bug 1247603] Re: nova-conductor process can't create cosumer connection to qpid after HeartbeatTimeout in heavy workload
In fact , the consume connection can be created . the create thread is interrupted by period task ,and can resume to connect to qpid
.
** No longer affects: oslo
** Changed in: nova
Status: New => Invalid
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1247603
Title:
nova-conductor process can't create cosumer connection to qpid after
HeartbeatTimeout in heavy workload
Status in OpenStack Compute (Nova):
Invalid
Bug description:
nova-conductor will lose the queue and not able to get requests
anymore after running workload for some time. This also occured in
process nova-compute. They share same impl_qpid.py.
When nova-conductor with heavy workload, a exception
HeartbeatTimeout will be raised. The exceptin will be caught and try
to reconnect to qpid server. logs shows we can't reconnect qoid in
method iterconsume , but can reconnect qpid server in method
publisher_send. That means we can't only send message to the qpid
queue, but can't receive message from qpid queue.
impl_qpid.py
def ensure(self, error_callback, method, *args, **kwargs):
while True:
try:
return method(*args, **kwargs) ---------------------------> raise HeartbeatTimeout
except (qpid_exceptions.Empty,
qpid_exceptions.ConnectionError), e:
if error_callback:
error_callback(e)
self.reconnect() ------------------------------> retry
method ensure is used in
def iterconsume(self, limit=None, timeout=None):
"""Return an iterator that will consume from all queues/consumers"""
def _error_callback(exc):
if isinstance(exc, qpid_exceptions.Empty):
LOG.debug(_('Timed out waiting for RPC response: %s') %
str(exc))
raise rpc_common.Timeout()
else:
LOG.exception(_('Failed to consume message from queue: %s') %
str(exc))
def _consume():
nxt_receiver = self.session.next_receiver(timeout=timeout)
try:
self._lookup_consumer(nxt_receiver).consume()
except Exception:
LOG.exception(_("Error processing message. Skipping it."))
for iteration in itertools.count(0):
if limit and iteration >= limit:
raise StopIteration
yield self.ensure(_error_callback, _consume) ----------------------> here can't reconnect
and
def publisher_send(self, cls, topic, msg):
"""Send to a publisher based on the publisher class"""
def _connect_error(exc):
log_info = {'topic': topic, 'err_str': str(exc)}
LOG.exception(_("Failed to publish message to topic "
"'%(topic)s': %(err_str)s") % log_info)
def _publisher_send():
publisher = cls(self.conf, self.session, topic)
publisher.send(msg)
return self.ensure(_connect_error, _publisher_send)
------------------> here can reconnect.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1247603/+subscriptions