← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1815871] Re: neutron-server api don't shutdown gracefully

 

Bug closed due to lack of activity, please feel free to reopen if
needed.

** Changed in: neutron
       Status: New => Won't Fix

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1815871

Title:
  neutron-server api don't shutdown gracefully

Status in neutron:
  Won't Fix

Bug description:
  When stop neutron-server, api worker will shutdown immediately no matter there are ongoing requests.
  And the ongoing requests will abort immediately.

  After testing, go through codes, compare with nova and cinder codes.
  The reason is that the stop and wait function in WorkerService in neutron/wsgi.py have issue.

      def wait(self):
          if isinstance(self._server, eventlet.greenthread.GreenThread):
              self._server.wait()

      def stop(self):
          if isinstance(self._server, eventlet.greenthread.GreenThread):
              self._server.kill()
              self._server = None

  Check the neutron codes above.
  After kill in stop function, self._server is forced to set to None, which makes nothing to do in wait function. This leads to api worker shutdown immediately without wait.

  Nova has the correct logic, check: https://github.com/openstack/nova/blob/master/nova/wsgi.py#L197
  Cinder use the oslo_service.wsgi, which has the same codes like nova.

  
  My Debugging as follows:
  I add log at line 978: 

  https://github.com/eventlet/eventlet/blob/master/eventlet/wsgi.py#L979

  serv.log.info('({0}) wsgi exiting, {1}'.format(serv.pid,
  pool.__dict__))

  I updated a neutron API to sleep for 10s, then I curl this API, at the
  same time I kill neutorn-server.

  Bellow is the neutorn-server log, I have 4 api workers. You can see. process 329 has a coroutines_running, but it does not log 'wsgi exited' because pool.waitall() in https://github.com/eventlet/eventlet/blob/master/eventlet/wsgi.py#L979 , 
  other 3 processes have no coroutines_running, so they log 'wsgi exited'.
  At last, these 4 child processes all exited with status 0.

  
  2019-02-13 17:37:31.193 319 INFO oslo_service.service [-] Caught SIGTERM, stopping children
  2019-02-13 17:37:31.194 319 DEBUG oslo_concurrency.lockutils [-] Acquired semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:212
  2019-02-13 17:37:31.194 319 DEBUG oslo_concurrency.lockutils [-] Releasing semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:225
  2019-02-13 17:37:31.195 319 DEBUG oslo_service.service [-] Stop services. stop /usr/lib/python2.7/site-packages/oslo_service/service.py:611
  2019-02-13 17:37:31.195 319 DEBUG oslo_service.service [-] Killing children. stop /usr/lib/python2.7/site-packages/oslo_service/service.py:616
  2019-02-13 17:37:31.195 319 INFO oslo_service.service [-] Waiting on 4 children to exit
  2019-02-13 17:37:31.196 328 INFO neutron.wsgi [-] (328) wsgi exiting, {'sem': <Semaphore at 0x5a97810 c=100 _w[0]>, 'coroutines_running': set([]), 'no_coros_running': <eventlet.event.Event object at 0x5a97a10>, 'size': 100}
  2019-02-13 17:37:31.196 329 INFO neutron.wsgi [-] (329) wsgi exiting, {'sem': <Semaphore at 0x5a97810 c=99 _w[0]>, 'coroutines_running': set([<greenlet.greenlet object at 0x5477cd0>]), 'no_coros_running': <eventlet.event.Event object at 0x5a97b10>, 'size': 100}
  2019-02-13 17:37:31.196 331 INFO neutron.wsgi [-] (331) wsgi exiting, {'sem': <Semaphore at 0x5a97810 c=100 _w[0]>, 'coroutines_running': set([]), 'no_coros_running': <eventlet.event.Event object at 0x5a97a10>, 'size': 100}
  2019-02-13 17:37:31.197 330 INFO neutron.wsgi [-] (330) wsgi exiting, {'sem': <Semaphore at 0x5a97810 c=100 _w[0]>, 'coroutines_running': set([]), 'no_coros_running': <eventlet.event.Event object at 0x5a97b10>, 'size': 100}
  2019-02-13 17:37:31.210 329 DEBUG oslo_concurrency.lockutils [req-d813d601-8563-4d0f-8b16-1418f81ddcc1 - - - - -] Acquired semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:212
  2019-02-13 17:37:31.211 329 DEBUG oslo_concurrency.lockutils [req-d813d601-8563-4d0f-8b16-1418f81ddcc1 - - - - -] Releasing semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:225
  2019-02-13 17:37:31.212 328 INFO neutron.wsgi [-] (328) wsgi exited, is_accepting=True
  2019-02-13 17:37:31.216 328 DEBUG oslo_concurrency.lockutils [req-d813d601-8563-4d0f-8b16-1418f81ddcc1 - - - - -] Acquired semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:212
  2019-02-13 17:37:31.217 331 INFO neutron.wsgi [-] (331) wsgi exited, is_accepting=True
  2019-02-13 17:37:31.218 328 DEBUG oslo_concurrency.lockutils [req-d813d601-8563-4d0f-8b16-1418f81ddcc1 - - - - -] Releasing semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:225
  2019-02-13 17:37:31.218 330 INFO neutron.wsgi [-] (330) wsgi exited, is_accepting=True
  2019-02-13 17:37:31.219 331 DEBUG oslo_concurrency.lockutils [req-d813d601-8563-4d0f-8b16-1418f81ddcc1 - - - - -] Acquired semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:212
  2019-02-13 17:37:31.220 331 DEBUG oslo_concurrency.lockutils [req-d813d601-8563-4d0f-8b16-1418f81ddcc1 - - - - -] Releasing semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:225
  2019-02-13 17:37:31.220 330 DEBUG oslo_concurrency.lockutils [req-d813d601-8563-4d0f-8b16-1418f81ddcc1 - - - - -] Acquired semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:212
  2019-02-13 17:37:31.221 330 DEBUG oslo_concurrency.lockutils [req-d813d601-8563-4d0f-8b16-1418f81ddcc1 - - - - -] Releasing semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:225
  2019-02-13 17:37:31.215 319 INFO oslo_service.service [-] Child 329 exited with status 0
  2019-02-13 17:37:31.224 319 INFO oslo_service.service [-] Child 328 exited with status 0
  2019-02-13 17:37:31.224 319 INFO oslo_service.service [-] Child 331 exited with status 0
  2019-02-13 17:37:31.226 319 INFO oslo_service.service [-] Child 330 exited with status 0

  If use my patch, neutron-server will wait this coroutines_running
  finished (Parent neutron-server will exit after 60s because the
  config: graceful_shutdown_timeout is 60s by default in oslo-service),
  then stop this child process.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1815871/+subscriptions



References