yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #76597
[Bug 1809823] Re: Neutron_api (unhealthy) after few days
UPDATE:
to reproduce the bug:
log into the neutron_api container :
docker exec -it --user root neutron_api bash
ps fax| grep neutron_api
()[root@undercloud /]# ps fax
PID TTY STAT TIME COMMAND
115 ? Ss 0:00 bash
140 ? S+ 0:00 \_ top
44 ? Ss 0:00 bash
241 ? R+ 0:00 \_ ps fax
1 ? Ss 0:00 /usr/local/bin/dumb-init /bin/bash /usr/local/bin/kolla_sta
7 ? Ss 0:05 /usr/bin/python2 /usr/bin/neutron-server --config-file /usr
27 ? S 0:08 \_ /usr/bin/python2 /usr/bin/neutron-server --config-file
28 ? S 0:00 \_ /usr/bin/python2 /usr/bin/neutron-server --config-file
29 ? S 0:03 \_ /usr/bin/python2 /usr/bin/neutron-server --config-file
30 ? S 0:03 \_ /usr/bin/python2 /usr/bin/neutron-server --config-file
31 ? S 0:03 \_ /usr/bin/python2 /usr/bin/neutron-server --config-file
32 ? R 5:27 \_ /usr/bin/python2 /usr/bin/neutron-server --config-file
Kill the last on pid ( 32 ) with sigup-
kill -1 32
check the server.log after few seconds :
2018-12-26 00:00:36.077 40997 ERROR oslo_service.service [-] Error starting thread.: RuntimeError: A fixed interval looping call can only run one function at a time
in our environment, this occurs without someone issuing kill -1 , but just after 4 days more or less there's a sigup and docker becomes unhealthy.
** Project changed: tripleo => neutron
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1809823
Title:
Neutron_api (unhealthy) after few days
Status in neutron:
New
Status in oslo.service:
Confirmed
Bug description:
Description
===========
on the undercloud ( pretty sure we also seen it on overcloud, i'll update when sure )
Without any action, we notice that neutron_api service is in "unhealthy" state and stop functioning.
Log shows -
2018-12-26 00:00:35.774 7 INFO oslo_service.service [-] Caught SIGHUP, stopping children
2018-12-26 00:00:36.077 40997 ERROR oslo_service.service [-] Error starting thread.: RuntimeError: A fixed interval looping call can only run one function at a time
openstack commands that needs neutron fails ( e.g openstack server
list )
Restarting the docker ( neutron_api ) resolves the problem.
Steps to reproduce
==================
Deploy.
Wait 4 days.
Expected result
===============
Service should remain healthy..
Actual result
=============
not healthy ..
Environment
===========
Rocky , container based.
Logs & Configs
==============
Logs : http://paste.openstack.org/show/738658/
More info:
==========
Google showed this -
https://bugs.launchpad.net/oslo.service/+bug/1547029
follow by -
http://paste.openstack.org/show/487420/
It seems that if we'll add "eventlet.sleep(0)" in <<<HERE>>> below, it
might resolve the issue. -
def run_service(service, done):
"""Service start wrapper.
:param service: service to run
:param done: event to wait on until a shutdown is triggered
:returns: None
"""
try:
<<<<< HERE >>>>>>>>
service.start()
except Exception:
LOG.exception('Error starting thread.')
raise SystemExit(1)
else:
done.wait()
The problem is that I didnt come up with an easy way to reproduce the issue in order to confirm it.
Any suggestions ?
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1809823/+subscriptions