← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1809823] Re: Neutron_api (unhealthy) after few days

 

UPDATE:
to reproduce the bug: 
log into the neutron_api container :

docker exec -it --user root neutron_api bash
ps fax| grep neutron_api
()[root@undercloud /]# ps fax
  PID TTY      STAT   TIME COMMAND
  115 ?        Ss     0:00 bash
  140 ?        S+     0:00  \_ top
   44 ?        Ss     0:00 bash
  241 ?        R+     0:00  \_ ps fax
    1 ?        Ss     0:00 /usr/local/bin/dumb-init /bin/bash /usr/local/bin/kolla_sta
    7 ?        Ss     0:05 /usr/bin/python2 /usr/bin/neutron-server --config-file /usr
   27 ?        S      0:08  \_ /usr/bin/python2 /usr/bin/neutron-server --config-file 
   28 ?        S      0:00  \_ /usr/bin/python2 /usr/bin/neutron-server --config-file 
   29 ?        S      0:03  \_ /usr/bin/python2 /usr/bin/neutron-server --config-file 
   30 ?        S      0:03  \_ /usr/bin/python2 /usr/bin/neutron-server --config-file 
   31 ?        S      0:03  \_ /usr/bin/python2 /usr/bin/neutron-server --config-file 
   32 ?        R      5:27  \_ /usr/bin/python2 /usr/bin/neutron-server --config-file 

Kill the last on pid ( 32 ) with sigup- 
kill -1 32 

check the server.log after few seconds : 
2018-12-26 00:00:36.077 40997 ERROR oslo_service.service [-] Error starting thread.: RuntimeError: A fixed interval looping call can only run one function at a time


in our environment, this occurs without someone issuing kill -1 , but just after 4 days more or less there's a sigup and docker becomes unhealthy.


** Project changed: tripleo => neutron

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1809823

Title:
  Neutron_api (unhealthy) after few days

Status in neutron:
  New
Status in oslo.service:
  Confirmed

Bug description:
  Description
  ===========
  on the undercloud ( pretty sure we also seen it on overcloud, i'll update when sure ) 
  Without any action, we notice that neutron_api service is in "unhealthy" state and stop functioning. 
  Log shows - 
  2018-12-26 00:00:35.774 7 INFO oslo_service.service [-] Caught SIGHUP, stopping children
  2018-12-26 00:00:36.077 40997 ERROR oslo_service.service [-] Error starting thread.: RuntimeError: A fixed interval looping call can only run one function at a time

  openstack commands that needs neutron fails ( e.g openstack server
  list  )

  Restarting the docker ( neutron_api ) resolves the problem.

  
  Steps to reproduce
  ==================
  Deploy. 
  Wait 4 days. 

  Expected result
  ===============
  Service should remain healthy.. 

  Actual result
  =============
  not healthy ..

  Environment
  ===========
  Rocky , container based.

  
  Logs & Configs
  ==============

  Logs : http://paste.openstack.org/show/738658/

  
  More info: 
  ==========
  Google showed this - 
  https://bugs.launchpad.net/oslo.service/+bug/1547029
  follow by - 
  http://paste.openstack.org/show/487420/

  It seems that if we'll add "eventlet.sleep(0)" in <<<HERE>>> below, it
  might resolve the issue. -

      def run_service(service, done):
          """Service start wrapper.

          :param service: service to run
          :param done: event to wait on until a shutdown is triggered
          :returns: None

          """
          try:
              <<<<< HERE >>>>>>>> 
              service.start()
          except Exception:
              LOG.exception('Error starting thread.')
              raise SystemExit(1)
          else:
              done.wait()

  
  The problem is that I didnt come up with an easy way to reproduce the issue in order to confirm it.

  Any suggestions ?

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1809823/+subscriptions