yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #77808
[Bug 1816727] Re: nova-novncproxy does not handle TCP RST cleanly when using SSL
** Also affects: nova/rocky
Importance: Undecided
Status: New
** Also affects: nova/stein
Importance: Undecided
Status: New
** Changed in: nova/stein
Importance: Undecided => Medium
** Changed in: nova/stein
Status: New => In Progress
** Changed in: nova/stein
Assignee: (unassigned) => Colleen Murphy (krinkle)
** Changed in: nova/rocky
Importance: Undecided => Medium
** Changed in: nova/rocky
Status: New => In Progress
** Changed in: nova/rocky
Assignee: (unassigned) => Colleen Murphy (krinkle)
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1816727
Title:
nova-novncproxy does not handle TCP RST cleanly when using SSL
Status in OpenStack Compute (nova):
Fix Released
Status in OpenStack Compute (nova) rocky series:
In Progress
Status in OpenStack Compute (nova) stein series:
In Progress
Status in OpenStack Security Advisory:
Won't Fix
Bug description:
Description
===========
We have nova-novncproxy configured to use SSL:
```
[DEFAULT]
ssl_only=true
cert = /etc/nova/ssl/certs/signing_cert.pem
key = /etc/nova/ssl/private/signing_key.pem
...
[vnc]
enabled = True
server_listen = "0.0.0.0"
server_proxyclient_address = 192.168.237.81
novncproxy_host = 192.168.237.81
novncproxy_port = 5554
novncproxy_base_url = https://<public hostname>:6080/vnc_auto.html
xvpvncproxy_host = 192.168.237.81
```
We also have haproxy acting as a load balancer, but not terminating
SSL. We have an haproxy health check configured like this for nova-
novncproxy:
```
listen nova-novncproxy
# irrelevant config...
server <internal hostname> 192.168.237.84:5554 check check-ssl verify none inter 2000 rise 5 fall 2
```
where 192.168.237.81 is a virtual IP address and 192.168.237.84 is the
node's individual IP address.
With that health check enabled, we found the nova-novncproxy process
CPU spiking and eventually causing the node to hang. With debug
logging enabled, we noticed this in the nova-novncproxy logs:
2019-02-19 15:02:44.148 2880 INFO nova.console.websocketproxy [-] WebSocket server settings:
2019-02-19 15:02:44.149 2880 INFO nova.console.websocketproxy [-] - Listen on 192.168.237.81:5554
2019-02-19 15:02:44.149 2880 INFO nova.console.websocketproxy [-] - Flash security policy server
2019-02-19 15:02:44.149 2880 INFO nova.console.websocketproxy [-] - Web server (no directory listings). Web root: /usr/share/novnc
2019-02-19 15:02:44.150 2880 INFO nova.console.websocketproxy [-] - SSL/TLS support
2019-02-19 15:02:44.151 2880 INFO nova.console.websocketproxy [-] - proxying from 192.168.237.81:5554 to None:None
2019-02-19 15:02:45.015 2880 DEBUG nova.console.websocketproxy [-] 192.168.237.85: new handler Process vmsg /usr/lib/python2.7/site-packages/websockify/websocket.py:873
2019-02-19 15:02:45.184 2889 DEBUG oslo_db.sqlalchemy.engines [req-8552f48d-8c1c-4330-aaec-64d544c6cc1e - - - - -] MySQL server mode set to STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTI
TUTION _check_effective_sql_mode /usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/engines.py:308
2019-02-19 15:02:45.377 2889 DEBUG nova.context [req-8552f48d-8c1c-4330-aaec-64d544c6cc1e - - - - -] Found 2 cells: 00000000-0000-0000-0000-000000000000(cell0),9f9825dd-868f-41cc-9c8e-e544f1528d6a(cell1) load_cells /usr/lib/python2.7/site-packages/nova/context.py:479
2019-02-19 15:02:45.380 2889 DEBUG oslo_concurrency.lockutils [req-8552f48d-8c1c-4330-aaec-64d544c6cc1e - - - - -] Lock "00000000-0000-0000-0000-000000000000" acquired by "nova.context.get_or_set_cached_cell_and_set_connections" :: waited 0.000s inner /usr/lib/python2.7/site-packag
es/oslo_concurrency/lockutils.py:273
2019-02-19 15:02:45.382 2889 DEBUG oslo_concurrency.lockutils [req-8552f48d-8c1c-4330-aaec-64d544c6cc1e - - - - -] Lock "00000000-0000-0000-0000-000000000000" released by "nova.context.get_or_set_cached_cell_and_set_connections" :: held 0.002s inner /usr/lib/python2.7/site-packages
/oslo_concurrency/lockutils.py:285
2019-02-19 15:02:45.393 2889 DEBUG oslo_concurrency.lockutils [req-8552f48d-8c1c-4330-aaec-64d544c6cc1e - - - - -] Lock "9f9825dd-868f-41cc-9c8e-e544f1528d6a" acquired by "nova.context.get_or_set_cached_cell_and_set_connections" :: waited 0.000s inner /usr/lib/python2.7/site-packag
es/oslo_concurrency/lockutils.py:273
2019-02-19 15:02:45.395 2889 DEBUG oslo_concurrency.lockutils [req-8552f48d-8c1c-4330-aaec-64d544c6cc1e - - - - -] Lock "9f9825dd-868f-41cc-9c8e-e544f1528d6a" released by "nova.context.get_or_set_cached_cell_and_set_connections" :: held 0.003s inner /usr/lib/python2.7/site-packages
/oslo_concurrency/lockutils.py:285
2019-02-19 15:02:45.437 2889 DEBUG oslo_db.sqlalchemy.engines [req-8552f48d-8c1c-4330-aaec-64d544c6cc1e - - - - -] MySQL server mode set to STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTI
TUTION _check_effective_sql_mode /usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/engines.py:308
2019-02-19 15:02:45.443 2889 DEBUG oslo_db.sqlalchemy.engines [req-8552f48d-8c1c-4330-aaec-64d544c6cc1e - - - - -] MySQL server mode set to STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTI
TUTION _check_effective_sql_mode /usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/engines.py:308
2019-02-19 15:02:45.451 2889 INFO nova.compute.rpcapi [req-8552f48d-8c1c-4330-aaec-64d544c6cc1e - - - - -] Automatically selected compute RPC version 5.0 from minimum service version 35
2019-02-19 15:02:45.452 2889 INFO nova.console.websocketproxy [req-8552f48d-8c1c-4330-aaec-64d544c6cc1e - - - - -] handler exception: [Errno 104] Connection reset by peer
2019-02-19 15:02:45.452 2889 DEBUG nova.console.websocketproxy [req-8552f48d-8c1c-4330-aaec-64d544c6cc1e - - - - -] exception vmsg /usr/lib/python2.7/site-packages/websockify/websocket.py:873
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy Traceback (most recent call last):
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy File "/usr/lib/python2.7/site-packages/websockify/websocket.py", line 928, in top_new_client
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy client = self.do_handshake(startsock, address)
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy File "/usr/lib/python2.7/site-packages/websockify/websocket.py", line 858, in do_handshake
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy self.RequestHandlerClass(retsock, address, self)
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy File "/usr/lib/python2.7/site-packages/nova/console/websocketproxy.py", line 311, in __init__
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy websockify.ProxyRequestHandler.__init__(self, *args, **kwargs)
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy File "/usr/lib/python2.7/site-packages/websockify/websocket.py", line 113, in __init__
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy SimpleHTTPRequestHandler.__init__(self, req, addr, server)
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy File "/usr/lib64/python2.7/SocketServer.py", line 652, in __init__
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy self.handle()
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy File "/usr/lib/python2.7/site-packages/websockify/websocket.py", line 579, in handle
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy SimpleHTTPRequestHandler.handle(self)
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy File "/usr/lib64/python2.7/BaseHTTPServer.py", line 340, in handle
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy self.handle_one_request()
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy File "/usr/lib64/python2.7/BaseHTTPServer.py", line 310, in handle_one_request
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy self.raw_requestline = self.rfile.readline(65537)
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy File "/usr/lib64/python2.7/socket.py", line 480, in readline
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy data = self._sock.recv(self._rbufsize)
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy File "/usr/lib/python2.7/site-packages/eventlet/green/ssl.py", line 190, in recv
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy return self._base_recv(buflen, flags, into=False)
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy File "/usr/lib/python2.7/site-packages/eventlet/green/ssl.py", line 217, in _base_recv
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy read = self.read(nbytes)
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy File "/usr/lib/python2.7/site-packages/eventlet/green/ssl.py", line 135, in read
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy super(GreenSSLSocket, self).read, *args, **kwargs)
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy File "/usr/lib/python2.7/site-packages/eventlet/green/ssl.py", line 109, in _call_trampolining
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy return func(*a, **kw)
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy File "/usr/lib64/python2.7/ssl.py", line 673, in read
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy v = self._sslobj.read(len)
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy error: [Errno 104] Connection reset by peer
2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy
2019-02-19 15:02:47.037 2880 DEBUG nova.console.websocketproxy [-] 192.168.237.85: new handler Process vmsg /usr/lib/python2.7/site-packages/websockify/websocket.py:873
(paste: http://paste.openstack.org/show/745451/)
This sequence starting with the "new handler Process" repeats
continuously. It seems that the haproxy health checks initiate an SSL
connection but then immediately send a TCP RST:
http://git.haproxy.org/?p=haproxy.git;a=commit;h=fd29cc537b8511db6e256529ded625c8e7f856d0
For most services this does not seem to be an issue, but for nova-
novncproxy it repeatedly initializes NovaProxyRequestHandler which
creates a full nova.compute.rpcapi.ComputeAPI instance which very
quickly starts to consume significant CPU and overtake the host.
Note that we tried upgrading to HEAD of websockify and eventlet which
did not improve the issue.
Our workaround was to turn off check-ssl in haproxy and use a basic
tcp check, but we're concerned that nova-novncproxy remains vulnerable
to a DOS attack given how easy it is for haproxy to overload the
service. For that reason I'm opening this initially as a security
bug, though you could perhaps argue that it's no secret that making
un-ratelimited requests at any service will cause high load.
Steps to reproduce
==================
1. Configure nova-novncproxy to use SSL by setting the cert= and key= parameters in [DEFAULT] and turn on debug logging.
2. You can simulate the haproxy SSL health check with this python script:
import socket, ssl, struct, time
host = '192.168.237.81'
port = 5554
while True:
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
ssl_sock = ssl.wrap_socket(sock)
ssl_sock.connect((host, port))
ssl_sock.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER, struct.pack('ii', 1, 0))
sock.close()
time.sleep(2)
Expected result
===============
nova-novncproxy should gracefully handle the RST and not start
overutilizing CPU. It should probably hold off on initializing
database connections and such until a meaningful request other than an
SSL HELLO is received.
Actual result
=============
The nova-novncproxy process quickly jumps to the top of the CPU%
metrics of process analyzers like top and htop and if left unattended
on a server with few cores will cause the server's overall performance
to be degraded.
Environment
===========
We found this on the latest of the stable/rocky branch on SLES:
# cat /etc/os-release
NAME="SLES"
VERSION="12-SP4"
VERSION_ID="12.4"
PRETTY_NAME="SUSE Linux Enterprise Server 12 SP4"
# uname -a
Linux d52-54-77-77-01-01 4.12.14-95.6-default #1 SMP Thu Jan 17 06:04:39 UTC 2019 (6af4ef8) x86_64 x86_64 x86_64 GNU/Linux
# zypper info openstack-nova
Information for package openstack-nova:
---------------------------------------
Repository : Cloud
Name : openstack-nova
Version : 18.1.1~dev47-749.1
Arch : noarch
Vendor : obs://build.suse.de/Devel:Cloud
Support Level : Level 3
Installed Size : 444.7 KiB
Installed : Yes
Status : up-to-date
Source package : openstack-nova-18.1.1~dev47-749.1.src
Summary : OpenStack Compute (Nova)
# zypper info haproxy
Information for package haproxy:
--------------------------------
Repository : Cloud
Name : haproxy
Version : 1.6.11-10.2
Arch : x86_64
Vendor : SUSE LLC <https://www.suse.com/>
Support Level : Level 3
Installed Size : 3.1 MiB
Installed : Yes
Status : up-to-date
Source package : haproxy-1.6.11-10.2.src
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1816727/+subscriptions