← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1816727] Re: nova-novncproxy does not handle TCP RST cleanly when using SSL 

 

** Also affects: nova/queens
   Importance: Undecided
       Status: New

** No longer affects: nova/queens

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1816727

Title:
  nova-novncproxy does not handle TCP RST cleanly when using SSL

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) rocky series:
  Fix Committed
Status in OpenStack Compute (nova) stein series:
  Fix Committed
Status in OpenStack Security Advisory:
  Won't Fix

Bug description:
  Description
  ===========

  We have nova-novncproxy configured to use SSL:

  ```
  [DEFAULT]
  ssl_only=true
  cert = /etc/nova/ssl/certs/signing_cert.pem
  key = /etc/nova/ssl/private/signing_key.pem
  ...
  [vnc]
  enabled = True
  server_listen = "0.0.0.0"
  server_proxyclient_address = 192.168.237.81
  novncproxy_host = 192.168.237.81
  novncproxy_port = 5554
  novncproxy_base_url = https://<public hostname>:6080/vnc_auto.html
  xvpvncproxy_host = 192.168.237.81
  ```

  We also have haproxy acting as a load balancer, but not terminating
  SSL. We have an haproxy health check configured like this for nova-
  novncproxy:

  ```
  listen nova-novncproxy
      # irrelevant config...
      server <internal hostname> 192.168.237.84:5554 check check-ssl verify none inter 2000 rise 5 fall 2
  ```

  where 192.168.237.81 is a virtual IP address and 192.168.237.84 is the
  node's individual IP address.

  With that health check enabled, we found the nova-novncproxy process
  CPU spiking and eventually causing the node to hang. With debug
  logging enabled, we noticed this in the nova-novncproxy logs:

  2019-02-19 15:02:44.148 2880 INFO nova.console.websocketproxy [-] WebSocket server settings:
  2019-02-19 15:02:44.149 2880 INFO nova.console.websocketproxy [-]   - Listen on 192.168.237.81:5554
  2019-02-19 15:02:44.149 2880 INFO nova.console.websocketproxy [-]   - Flash security policy server
  2019-02-19 15:02:44.149 2880 INFO nova.console.websocketproxy [-]   - Web server (no directory listings). Web root: /usr/share/novnc
  2019-02-19 15:02:44.150 2880 INFO nova.console.websocketproxy [-]   - SSL/TLS support
  2019-02-19 15:02:44.151 2880 INFO nova.console.websocketproxy [-]   - proxying from 192.168.237.81:5554 to None:None
  2019-02-19 15:02:45.015 2880 DEBUG nova.console.websocketproxy [-] 192.168.237.85: new handler Process vmsg /usr/lib/python2.7/site-packages/websockify/websocket.py:873
  2019-02-19 15:02:45.184 2889 DEBUG oslo_db.sqlalchemy.engines [req-8552f48d-8c1c-4330-aaec-64d544c6cc1e - - - - -] MySQL server mode set to STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTI
  TUTION _check_effective_sql_mode /usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/engines.py:308
  2019-02-19 15:02:45.377 2889 DEBUG nova.context [req-8552f48d-8c1c-4330-aaec-64d544c6cc1e - - - - -] Found 2 cells: 00000000-0000-0000-0000-000000000000(cell0),9f9825dd-868f-41cc-9c8e-e544f1528d6a(cell1) load_cells /usr/lib/python2.7/site-packages/nova/context.py:479
  2019-02-19 15:02:45.380 2889 DEBUG oslo_concurrency.lockutils [req-8552f48d-8c1c-4330-aaec-64d544c6cc1e - - - - -] Lock "00000000-0000-0000-0000-000000000000" acquired by "nova.context.get_or_set_cached_cell_and_set_connections" :: waited 0.000s inner /usr/lib/python2.7/site-packag
  es/oslo_concurrency/lockutils.py:273
  2019-02-19 15:02:45.382 2889 DEBUG oslo_concurrency.lockutils [req-8552f48d-8c1c-4330-aaec-64d544c6cc1e - - - - -] Lock "00000000-0000-0000-0000-000000000000" released by "nova.context.get_or_set_cached_cell_and_set_connections" :: held 0.002s inner /usr/lib/python2.7/site-packages
  /oslo_concurrency/lockutils.py:285
  2019-02-19 15:02:45.393 2889 DEBUG oslo_concurrency.lockutils [req-8552f48d-8c1c-4330-aaec-64d544c6cc1e - - - - -] Lock "9f9825dd-868f-41cc-9c8e-e544f1528d6a" acquired by "nova.context.get_or_set_cached_cell_and_set_connections" :: waited 0.000s inner /usr/lib/python2.7/site-packag
  es/oslo_concurrency/lockutils.py:273
  2019-02-19 15:02:45.395 2889 DEBUG oslo_concurrency.lockutils [req-8552f48d-8c1c-4330-aaec-64d544c6cc1e - - - - -] Lock "9f9825dd-868f-41cc-9c8e-e544f1528d6a" released by "nova.context.get_or_set_cached_cell_and_set_connections" :: held 0.003s inner /usr/lib/python2.7/site-packages
  /oslo_concurrency/lockutils.py:285
  2019-02-19 15:02:45.437 2889 DEBUG oslo_db.sqlalchemy.engines [req-8552f48d-8c1c-4330-aaec-64d544c6cc1e - - - - -] MySQL server mode set to STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTI
  TUTION _check_effective_sql_mode /usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/engines.py:308
  2019-02-19 15:02:45.443 2889 DEBUG oslo_db.sqlalchemy.engines [req-8552f48d-8c1c-4330-aaec-64d544c6cc1e - - - - -] MySQL server mode set to STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTI
  TUTION _check_effective_sql_mode /usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/engines.py:308
  2019-02-19 15:02:45.451 2889 INFO nova.compute.rpcapi [req-8552f48d-8c1c-4330-aaec-64d544c6cc1e - - - - -] Automatically selected compute RPC version 5.0 from minimum service version 35
  2019-02-19 15:02:45.452 2889 INFO nova.console.websocketproxy [req-8552f48d-8c1c-4330-aaec-64d544c6cc1e - - - - -] handler exception: [Errno 104] Connection reset by peer
  2019-02-19 15:02:45.452 2889 DEBUG nova.console.websocketproxy [req-8552f48d-8c1c-4330-aaec-64d544c6cc1e - - - - -] exception vmsg /usr/lib/python2.7/site-packages/websockify/websocket.py:873
  2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy Traceback (most recent call last):
  2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy   File "/usr/lib/python2.7/site-packages/websockify/websocket.py", line 928, in top_new_client
  2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy     client = self.do_handshake(startsock, address)
  2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy   File "/usr/lib/python2.7/site-packages/websockify/websocket.py", line 858, in do_handshake
  2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy     self.RequestHandlerClass(retsock, address, self)
  2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy   File "/usr/lib/python2.7/site-packages/nova/console/websocketproxy.py", line 311, in __init__
  2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy     websockify.ProxyRequestHandler.__init__(self, *args, **kwargs)
  2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy   File "/usr/lib/python2.7/site-packages/websockify/websocket.py", line 113, in __init__
  2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy     SimpleHTTPRequestHandler.__init__(self, req, addr, server)
  2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy   File "/usr/lib64/python2.7/SocketServer.py", line 652, in __init__
  2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy     self.handle()
  2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy   File "/usr/lib/python2.7/site-packages/websockify/websocket.py", line 579, in handle
  2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy     SimpleHTTPRequestHandler.handle(self)
  2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy   File "/usr/lib64/python2.7/BaseHTTPServer.py", line 340, in handle
  2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy     self.handle_one_request()
  2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy   File "/usr/lib64/python2.7/BaseHTTPServer.py", line 310, in handle_one_request
  2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy     self.raw_requestline = self.rfile.readline(65537)
  2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy   File "/usr/lib64/python2.7/socket.py", line 480, in readline
  2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy     data = self._sock.recv(self._rbufsize)
  2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy   File "/usr/lib/python2.7/site-packages/eventlet/green/ssl.py", line 190, in recv
  2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy     return self._base_recv(buflen, flags, into=False)
  2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy   File "/usr/lib/python2.7/site-packages/eventlet/green/ssl.py", line 217, in _base_recv
  2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy     read = self.read(nbytes)
  2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy   File "/usr/lib/python2.7/site-packages/eventlet/green/ssl.py", line 135, in read
  2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy     super(GreenSSLSocket, self).read, *args, **kwargs)
  2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy   File "/usr/lib/python2.7/site-packages/eventlet/green/ssl.py", line 109, in _call_trampolining
  2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy     return func(*a, **kw)
  2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy   File "/usr/lib64/python2.7/ssl.py", line 673, in read
  2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy     v = self._sslobj.read(len)
  2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy error: [Errno 104] Connection reset by peer
  2019-02-19 15:02:45.452 2889 ERROR nova.console.websocketproxy
  2019-02-19 15:02:47.037 2880 DEBUG nova.console.websocketproxy [-] 192.168.237.85: new handler Process vmsg /usr/lib/python2.7/site-packages/websockify/websocket.py:873

  (paste: http://paste.openstack.org/show/745451/)

  This sequence starting with the "new handler Process" repeats
  continuously. It seems that the haproxy health checks initiate an SSL
  connection but then immediately send a TCP RST:

  http://git.haproxy.org/?p=haproxy.git;a=commit;h=fd29cc537b8511db6e256529ded625c8e7f856d0

  For most services this does not seem to be an issue, but for nova-
  novncproxy it repeatedly initializes NovaProxyRequestHandler which
  creates a full nova.compute.rpcapi.ComputeAPI instance which very
  quickly starts to consume significant CPU and overtake the host.

  Note that we tried upgrading to HEAD of websockify and eventlet which
  did not improve the issue.

  Our workaround was to turn off check-ssl in haproxy and use a basic
  tcp check, but we're concerned that nova-novncproxy remains vulnerable
  to a DOS attack given how easy it is for haproxy to overload the
  service.  For that reason I'm opening this initially as a security
  bug, though you could perhaps argue that it's no secret that making
  un-ratelimited requests at any service will cause high load.

  Steps to reproduce
  ==================

  1. Configure nova-novncproxy to use SSL by setting the cert= and key= parameters in [DEFAULT] and turn on debug logging.
  2. You can simulate the haproxy SSL health check with this python script:

      import socket, ssl, struct, time
      host = '192.168.237.81'
      port = 5554
      while True:
          sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
          ssl_sock = ssl.wrap_socket(sock)
          ssl_sock.connect((host, port))
          ssl_sock.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER, struct.pack('ii', 1, 0))
          sock.close()
          time.sleep(2)

  Expected result
  ===============

  nova-novncproxy should gracefully handle the RST and not start
  overutilizing CPU. It should probably hold off on initializing
  database connections and such until a meaningful request other than an
  SSL HELLO is received.

  Actual result
  =============

  The nova-novncproxy process quickly jumps to the top of the CPU%
  metrics of process analyzers like top and htop and if left unattended
  on a server with few cores will cause the server's overall performance
  to be degraded.

  Environment
  ===========

  We found this on the latest of the stable/rocky branch on SLES:

  # cat /etc/os-release
  NAME="SLES"
  VERSION="12-SP4"
  VERSION_ID="12.4"
  PRETTY_NAME="SUSE Linux Enterprise Server 12 SP4"

  # uname -a
  Linux d52-54-77-77-01-01 4.12.14-95.6-default #1 SMP Thu Jan 17 06:04:39 UTC 2019 (6af4ef8) x86_64 x86_64 x86_64 GNU/Linux

  # zypper info openstack-nova

  Information for package openstack-nova:
  ---------------------------------------
  Repository     : Cloud
  Name           : openstack-nova
  Version        : 18.1.1~dev47-749.1
  Arch           : noarch
  Vendor         : obs://build.suse.de/Devel:Cloud
  Support Level  : Level 3
  Installed Size : 444.7 KiB
  Installed      : Yes
  Status         : up-to-date
  Source package : openstack-nova-18.1.1~dev47-749.1.src
  Summary        : OpenStack Compute (Nova)

  # zypper info haproxy

  Information for package haproxy:
  --------------------------------
  Repository     : Cloud
  Name           : haproxy
  Version        : 1.6.11-10.2
  Arch           : x86_64
  Vendor         : SUSE LLC <https://www.suse.com/>
  Support Level  : Level 3
  Installed Size : 3.1 MiB
  Installed      : Yes
  Status         : up-to-date
  Source package : haproxy-1.6.11-10.2.src

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1816727/+subscriptions