yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #89962
[Bug 1815871] Re: neutron-server api don't shutdown gracefully
Bug closed due to lack of activity, please feel free to reopen if
needed.
** Changed in: neutron
Status: New => Won't Fix
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1815871
Title:
neutron-server api don't shutdown gracefully
Status in neutron:
Won't Fix
Bug description:
When stop neutron-server, api worker will shutdown immediately no matter there are ongoing requests.
And the ongoing requests will abort immediately.
After testing, go through codes, compare with nova and cinder codes.
The reason is that the stop and wait function in WorkerService in neutron/wsgi.py have issue.
def wait(self):
if isinstance(self._server, eventlet.greenthread.GreenThread):
self._server.wait()
def stop(self):
if isinstance(self._server, eventlet.greenthread.GreenThread):
self._server.kill()
self._server = None
Check the neutron codes above.
After kill in stop function, self._server is forced to set to None, which makes nothing to do in wait function. This leads to api worker shutdown immediately without wait.
Nova has the correct logic, check: https://github.com/openstack/nova/blob/master/nova/wsgi.py#L197
Cinder use the oslo_service.wsgi, which has the same codes like nova.
My Debugging as follows:
I add log at line 978:
https://github.com/eventlet/eventlet/blob/master/eventlet/wsgi.py#L979
serv.log.info('({0}) wsgi exiting, {1}'.format(serv.pid,
pool.__dict__))
I updated a neutron API to sleep for 10s, then I curl this API, at the
same time I kill neutorn-server.
Bellow is the neutorn-server log, I have 4 api workers. You can see. process 329 has a coroutines_running, but it does not log 'wsgi exited' because pool.waitall() in https://github.com/eventlet/eventlet/blob/master/eventlet/wsgi.py#L979 ,
other 3 processes have no coroutines_running, so they log 'wsgi exited'.
At last, these 4 child processes all exited with status 0.
2019-02-13 17:37:31.193 319 INFO oslo_service.service [-] Caught SIGTERM, stopping children
2019-02-13 17:37:31.194 319 DEBUG oslo_concurrency.lockutils [-] Acquired semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:212
2019-02-13 17:37:31.194 319 DEBUG oslo_concurrency.lockutils [-] Releasing semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:225
2019-02-13 17:37:31.195 319 DEBUG oslo_service.service [-] Stop services. stop /usr/lib/python2.7/site-packages/oslo_service/service.py:611
2019-02-13 17:37:31.195 319 DEBUG oslo_service.service [-] Killing children. stop /usr/lib/python2.7/site-packages/oslo_service/service.py:616
2019-02-13 17:37:31.195 319 INFO oslo_service.service [-] Waiting on 4 children to exit
2019-02-13 17:37:31.196 328 INFO neutron.wsgi [-] (328) wsgi exiting, {'sem': <Semaphore at 0x5a97810 c=100 _w[0]>, 'coroutines_running': set([]), 'no_coros_running': <eventlet.event.Event object at 0x5a97a10>, 'size': 100}
2019-02-13 17:37:31.196 329 INFO neutron.wsgi [-] (329) wsgi exiting, {'sem': <Semaphore at 0x5a97810 c=99 _w[0]>, 'coroutines_running': set([<greenlet.greenlet object at 0x5477cd0>]), 'no_coros_running': <eventlet.event.Event object at 0x5a97b10>, 'size': 100}
2019-02-13 17:37:31.196 331 INFO neutron.wsgi [-] (331) wsgi exiting, {'sem': <Semaphore at 0x5a97810 c=100 _w[0]>, 'coroutines_running': set([]), 'no_coros_running': <eventlet.event.Event object at 0x5a97a10>, 'size': 100}
2019-02-13 17:37:31.197 330 INFO neutron.wsgi [-] (330) wsgi exiting, {'sem': <Semaphore at 0x5a97810 c=100 _w[0]>, 'coroutines_running': set([]), 'no_coros_running': <eventlet.event.Event object at 0x5a97b10>, 'size': 100}
2019-02-13 17:37:31.210 329 DEBUG oslo_concurrency.lockutils [req-d813d601-8563-4d0f-8b16-1418f81ddcc1 - - - - -] Acquired semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:212
2019-02-13 17:37:31.211 329 DEBUG oslo_concurrency.lockutils [req-d813d601-8563-4d0f-8b16-1418f81ddcc1 - - - - -] Releasing semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:225
2019-02-13 17:37:31.212 328 INFO neutron.wsgi [-] (328) wsgi exited, is_accepting=True
2019-02-13 17:37:31.216 328 DEBUG oslo_concurrency.lockutils [req-d813d601-8563-4d0f-8b16-1418f81ddcc1 - - - - -] Acquired semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:212
2019-02-13 17:37:31.217 331 INFO neutron.wsgi [-] (331) wsgi exited, is_accepting=True
2019-02-13 17:37:31.218 328 DEBUG oslo_concurrency.lockutils [req-d813d601-8563-4d0f-8b16-1418f81ddcc1 - - - - -] Releasing semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:225
2019-02-13 17:37:31.218 330 INFO neutron.wsgi [-] (330) wsgi exited, is_accepting=True
2019-02-13 17:37:31.219 331 DEBUG oslo_concurrency.lockutils [req-d813d601-8563-4d0f-8b16-1418f81ddcc1 - - - - -] Acquired semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:212
2019-02-13 17:37:31.220 331 DEBUG oslo_concurrency.lockutils [req-d813d601-8563-4d0f-8b16-1418f81ddcc1 - - - - -] Releasing semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:225
2019-02-13 17:37:31.220 330 DEBUG oslo_concurrency.lockutils [req-d813d601-8563-4d0f-8b16-1418f81ddcc1 - - - - -] Acquired semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:212
2019-02-13 17:37:31.221 330 DEBUG oslo_concurrency.lockutils [req-d813d601-8563-4d0f-8b16-1418f81ddcc1 - - - - -] Releasing semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:225
2019-02-13 17:37:31.215 319 INFO oslo_service.service [-] Child 329 exited with status 0
2019-02-13 17:37:31.224 319 INFO oslo_service.service [-] Child 328 exited with status 0
2019-02-13 17:37:31.224 319 INFO oslo_service.service [-] Child 331 exited with status 0
2019-02-13 17:37:31.226 319 INFO oslo_service.service [-] Child 330 exited with status 0
If use my patch, neutron-server will wait this coroutines_running
finished (Parent neutron-server will exit after 60s because the
config: graceful_shutdown_timeout is 60s by default in oslo-service),
then stop this child process.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1815871/+subscriptions
References