yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #26696
[Bug 1408612] [NEW] HTTP Keep-alive connections prevent keystone from terminating
Public bug reported:
Seen on RDO Juno, running on CentOS 7.
Steps to reproduce:
- Set admin_workers=1 and public_workers=1 in /etc/keystone/keystone.conf
- Start the keystone service: `systemctl start openstack-keystone`
- Start a 'persistent' TCP connection to keystone: `telnet localhost 5000 &`
- Stop the service: `systemctl stop openstack-keystone`
The final systemctl invokation will hang, as the process fails to
terminate. Eventually it will time out and forcefully kill the process.
Output of `systemctl status openstack-keystone`:
Jan 08 05:07:38 mgoddard systemd[1]: openstack-keystone.service stopping timed out. Killing.
Jan 08 05:07:38 mgoddard systemd[1]: openstack-keystone.service: main process exited, code=killed, status=9/KILL
Jan 08 05:07:38 mgoddard systemd[1]: Stopped OpenStack Identity Service.
Jan 08 05:07:38 mgoddard systemd[1]: Unit openstack-keystone.service entered failed state.
The use of telnet here is just to demonstrate the problem. The same
effect can be seen when OpenStack services maintain persistent
connections to keystone.
With multiple worker processes, the issue is not observed. It is
believed that as systemd is able to kill the parent process, the child
process holding the persistent connection is killed by systemd, so the
issue is not observed (although this is speculation).
When this issue was first observed, multiple workers were used and
systemd was not in use. Rather, we used init scripts in /etc/init.d/. In
this case the result was worse, as the `service openstack-keystone stop`
command would exit successfully, but fail to terminate any child
processes with persistent HTTP connections open. Subsequent attempts to
start the keystone service would fail due to the lingering stale
process.
During the investigation of the issue, some root cause analysis was performed which will be presented below.
- When a keystone process receives SIGTERM, it ends up waiting for all greenthreads in the greenpool to finish at https://github.com/eventlet/eventlet/blob/8d2474197de4827a7bca9c33e71a82573b6fc721/eventlet/wsgi.py#L267.
- Persistent connections, when between HTTP requests, end up waiting at https://github.com/eventlet/eventlet/blob/8d2474197de4827a7bca9c33e71a82573b6fc721/eventlet/wsgi.py#L267 for the next request. The greenthread will not terminate until the connection is closed.
The process will therefore not terminate until all connections have
closed. It seems sensible to me to finish servicing individual requests
for a graceful shutdown, but there needs to be a mechanism to close
persistent connections between requests.
This issue could (should?) be solved in eventlet.wsgi by a mechanism to
trigger disconnection of persistent connections between requests when
the server is stopped.
** Affects: keystone
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to Keystone.
https://bugs.launchpad.net/bugs/1408612
Title:
HTTP Keep-alive connections prevent keystone from terminating
Status in OpenStack Identity (Keystone):
New
Bug description:
Seen on RDO Juno, running on CentOS 7.
Steps to reproduce:
- Set admin_workers=1 and public_workers=1 in /etc/keystone/keystone.conf
- Start the keystone service: `systemctl start openstack-keystone`
- Start a 'persistent' TCP connection to keystone: `telnet localhost 5000 &`
- Stop the service: `systemctl stop openstack-keystone`
The final systemctl invokation will hang, as the process fails to
terminate. Eventually it will time out and forcefully kill the
process.
Output of `systemctl status openstack-keystone`:
Jan 08 05:07:38 mgoddard systemd[1]: openstack-keystone.service stopping timed out. Killing.
Jan 08 05:07:38 mgoddard systemd[1]: openstack-keystone.service: main process exited, code=killed, status=9/KILL
Jan 08 05:07:38 mgoddard systemd[1]: Stopped OpenStack Identity Service.
Jan 08 05:07:38 mgoddard systemd[1]: Unit openstack-keystone.service entered failed state.
The use of telnet here is just to demonstrate the problem. The same
effect can be seen when OpenStack services maintain persistent
connections to keystone.
With multiple worker processes, the issue is not observed. It is
believed that as systemd is able to kill the parent process, the child
process holding the persistent connection is killed by systemd, so the
issue is not observed (although this is speculation).
When this issue was first observed, multiple workers were used and
systemd was not in use. Rather, we used init scripts in /etc/init.d/.
In this case the result was worse, as the `service openstack-keystone
stop` command would exit successfully, but fail to terminate any child
processes with persistent HTTP connections open. Subsequent attempts
to start the keystone service would fail due to the lingering stale
process.
During the investigation of the issue, some root cause analysis was performed which will be presented below.
- When a keystone process receives SIGTERM, it ends up waiting for all greenthreads in the greenpool to finish at https://github.com/eventlet/eventlet/blob/8d2474197de4827a7bca9c33e71a82573b6fc721/eventlet/wsgi.py#L267.
- Persistent connections, when between HTTP requests, end up waiting at https://github.com/eventlet/eventlet/blob/8d2474197de4827a7bca9c33e71a82573b6fc721/eventlet/wsgi.py#L267 for the next request. The greenthread will not terminate until the connection is closed.
The process will therefore not terminate until all connections have
closed. It seems sensible to me to finish servicing individual
requests for a graceful shutdown, but there needs to be a mechanism to
close persistent connections between requests.
This issue could (should?) be solved in eventlet.wsgi by a mechanism
to trigger disconnection of persistent connections between requests
when the server is stopped.
To manage notifications about this bug go to:
https://bugs.launchpad.net/keystone/+bug/1408612/+subscriptions
Follow ups
References