← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1408612] [NEW] HTTP Keep-alive connections prevent keystone from terminating

 

Public bug reported:

Seen on RDO Juno, running on CentOS 7.

Steps to reproduce:

- Set admin_workers=1 and public_workers=1 in /etc/keystone/keystone.conf
- Start the keystone service: `systemctl start openstack-keystone`
- Start a 'persistent' TCP connection to keystone: `telnet localhost 5000 &`
- Stop the service: `systemctl stop openstack-keystone`

The final systemctl invokation will hang, as the process fails to
terminate. Eventually it will time out and forcefully kill the process.

Output of `systemctl status openstack-keystone`:

Jan 08 05:07:38 mgoddard systemd[1]: openstack-keystone.service stopping timed out. Killing.
Jan 08 05:07:38 mgoddard systemd[1]: openstack-keystone.service: main process exited, code=killed, status=9/KILL
Jan 08 05:07:38 mgoddard systemd[1]: Stopped OpenStack Identity Service.
Jan 08 05:07:38 mgoddard systemd[1]: Unit openstack-keystone.service entered failed state.

The use of telnet here is just to demonstrate the problem. The same
effect can be seen when OpenStack services maintain persistent
connections to keystone.

With multiple worker processes, the issue is not observed. It is
believed that as systemd is able to kill the parent process, the child
process holding the persistent connection is killed by systemd, so the
issue is not observed (although this is speculation).

When this issue was first observed, multiple workers were used and
systemd was not in use. Rather, we used init scripts in /etc/init.d/. In
this case the result was worse, as the `service openstack-keystone stop`
command would exit successfully, but fail to terminate any child
processes with persistent HTTP connections open. Subsequent attempts to
start the keystone service would fail due to the lingering stale
process.


During the investigation of the issue, some root cause analysis was performed which will be presented below.

- When a keystone process receives SIGTERM, it ends up waiting for all greenthreads in the greenpool to finish at https://github.com/eventlet/eventlet/blob/8d2474197de4827a7bca9c33e71a82573b6fc721/eventlet/wsgi.py#L267.
- Persistent connections, when between HTTP requests, end up waiting at https://github.com/eventlet/eventlet/blob/8d2474197de4827a7bca9c33e71a82573b6fc721/eventlet/wsgi.py#L267 for the next request. The greenthread will not terminate until the connection is closed.

The process will therefore not terminate until all connections have
closed. It seems sensible to me to finish servicing individual requests
for a graceful shutdown, but there needs to be a mechanism to close
persistent connections between requests.

This issue could (should?) be solved in eventlet.wsgi by a mechanism to
trigger disconnection of persistent connections between requests when
the server is stopped.

** Affects: keystone
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to Keystone.
https://bugs.launchpad.net/bugs/1408612

Title:
  HTTP Keep-alive connections prevent keystone from terminating

Status in OpenStack Identity (Keystone):
  New

Bug description:
  Seen on RDO Juno, running on CentOS 7.

  Steps to reproduce:

  - Set admin_workers=1 and public_workers=1 in /etc/keystone/keystone.conf
  - Start the keystone service: `systemctl start openstack-keystone`
  - Start a 'persistent' TCP connection to keystone: `telnet localhost 5000 &`
  - Stop the service: `systemctl stop openstack-keystone`

  The final systemctl invokation will hang, as the process fails to
  terminate. Eventually it will time out and forcefully kill the
  process.

  Output of `systemctl status openstack-keystone`:

  Jan 08 05:07:38 mgoddard systemd[1]: openstack-keystone.service stopping timed out. Killing.
  Jan 08 05:07:38 mgoddard systemd[1]: openstack-keystone.service: main process exited, code=killed, status=9/KILL
  Jan 08 05:07:38 mgoddard systemd[1]: Stopped OpenStack Identity Service.
  Jan 08 05:07:38 mgoddard systemd[1]: Unit openstack-keystone.service entered failed state.

  The use of telnet here is just to demonstrate the problem. The same
  effect can be seen when OpenStack services maintain persistent
  connections to keystone.

  With multiple worker processes, the issue is not observed. It is
  believed that as systemd is able to kill the parent process, the child
  process holding the persistent connection is killed by systemd, so the
  issue is not observed (although this is speculation).

  When this issue was first observed, multiple workers were used and
  systemd was not in use. Rather, we used init scripts in /etc/init.d/.
  In this case the result was worse, as the `service openstack-keystone
  stop` command would exit successfully, but fail to terminate any child
  processes with persistent HTTP connections open. Subsequent attempts
  to start the keystone service would fail due to the lingering stale
  process.

  
  During the investigation of the issue, some root cause analysis was performed which will be presented below.

  - When a keystone process receives SIGTERM, it ends up waiting for all greenthreads in the greenpool to finish at https://github.com/eventlet/eventlet/blob/8d2474197de4827a7bca9c33e71a82573b6fc721/eventlet/wsgi.py#L267.
  - Persistent connections, when between HTTP requests, end up waiting at https://github.com/eventlet/eventlet/blob/8d2474197de4827a7bca9c33e71a82573b6fc721/eventlet/wsgi.py#L267 for the next request. The greenthread will not terminate until the connection is closed.

  The process will therefore not terminate until all connections have
  closed. It seems sensible to me to finish servicing individual
  requests for a graceful shutdown, but there needs to be a mechanism to
  close persistent connections between requests.

  This issue could (should?) be solved in eventlet.wsgi by a mechanism
  to trigger disconnection of persistent connections between requests
  when the server is stopped.

To manage notifications about this bug go to:
https://bugs.launchpad.net/keystone/+bug/1408612/+subscriptions


Follow ups

References