← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1855919] Re: broken pipe errors cause neutron metadata agent to fail

 

Fix proposed to branch: master
Review: https://review.opendev.org/699372

** Changed in: neutron
       Status: Opinion => In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1855919

Title:
  broken pipe errors cause neutron metadata agent to fail

Status in neutron:
  In Progress

Bug description:
  After we increased computes to 200, we started seeing "broken pipe"
  errors in neutron-metadata-agent.log on the controllers. After a
  neutron restart the errors are reduced, then they increase until the
  log is mostly errors, and the neutron metadata service fails, and VMs
  cannot boot. Another symptom is that unacked RMQ messages build up in
  the q-plugin queue. This is the first error we see; this one occurs as
  the server is starting:

  
  2019-12-10 10:56:01.942 1838536 INFO eventlet.wsgi.server [-] (1838536) wsgi starting up on http:/var/lib/neutron/metadata_proxy
  2019-12-10 10:56:01.943 1838538 INFO eventlet.wsgi.server [-] (1838538) wsgi starting up on http:/var/lib/neutron/metadata_proxy
  2019-12-10 10:56:01.945 1838539 INFO eventlet.wsgi.server [-] (1838539) wsgi starting up on http:/var/lib/neutron/metadata_proxy
  2019-12-10 10:56:21.138 1838538 INFO eventlet.wsgi.server [-] Traceback (most recent call last):
    File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 521, in handle_one_response
      write(b''.join(towrite))
    File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 462, in write
      wfile.flush()
    File "/usr/lib/python2.7/socket.py", line 307, in flush
      self._sock.sendall(view[write_offset:write_offset+buffer_size])
    File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 390, in sendall
      tail = self.send(data, flags)
    File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 384, in send
      return self._send_loop(self.fd.send, data, flags)
    File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 371, in _send_loop
      return send_method(data, *args)
  error: [Errno 32] Broken pipe

  2019-12-10 10:56:21.138 1838538 INFO eventlet.wsgi.server [-] 10.195.74.25,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200  len: 0 time: 19.0296111
  2019-12-10 10:56:25.059 1838516 INFO eventlet.wsgi.server [-] 10.195.74.28,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200  len: 146 time: 0.2840948
  2019-12-10 10:56:25.181 1838529 INFO eventlet.wsgi.server [-] 10.195.74.68,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200  len: 146 time: 0.2695429
  2019-12-10 10:56:25.259 1838518 INFO eventlet.wsgi.server [-] 10.195.74.28,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200  len: 146 time: 0.1980510

  Then we see some "call queues" warnings and the threshold increases to
  40:

  2019-12-10 10:56:31.414 1838515 WARNING
  oslo_messaging._drivers.amqpdriver [-] Number of call queues is 11,
  greater than warning threshold: 10. There could be a leak. Increasing
  threshold to: 20

  Next we see RPC timeout errors:

  2019-12-10 10:57:02.043 1838520 WARNING oslo_messaging._drivers.amqpdriver [-] Number of call queues is 11, greater than warning threshold: 10. There could be a leak. Increasing threshold to: 20
  2019-12-10 10:57:02.059 1838534 ERROR neutron.common.rpc [-] Timeout in RPC method get_ports. Waiting for 37 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: MessagingTimeout: Timed out waiting for a reply to message ID 1ed3e021607e466f8b9b84cd3b05b188
  2019-12-10 10:57:02.059 1838534 WARNING neutron.common.rpc [-] Increasing timeout for get_ports calls to 120 seconds. Restart the agent to restore it to the default value.: MessagingTimeout: Timed out waiting for a reply to message ID 1ed3e021607e466f8b9b84cd3b05b188
  2019-12-10 10:57:02.285 1838521 INFO eventlet.wsgi.server [-] 10.195.74.27,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200  len: 146 time: 0.7959940

  2019-12-10 10:57:16.215 1838531 WARNING
  oslo_messaging._drivers.amqpdriver [-] Number of call queues is 21,
  greater than warning threshold: 20. There could be a leak. Increasing
  threshold to: 40

  2019-12-10 10:57:17.339 1838539 WARNING
  oslo_messaging._drivers.amqpdriver [-] Number of call queues is 11,
  greater than warning threshold: 10. There could be a leak. Increasing
  threshold to: 20

  2019-12-10 10:57:24.838 1838524 INFO eventlet.wsgi.server [-] 10.195.73.242,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200  len: 146 time: 0.6842020
  2019-12-10 10:57:24.882 1838524 ERROR neutron.common.rpc [-] Timeout in RPC method get_ports. Waiting for 3 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: MessagingTimeout: Timed out waiting for a reply to message ID 2bb5faa3ec8d4f5b9d3bd3e2fe095f9e
  2019-12-10 10:57:24.883 1838524 WARNING neutron.common.rpc [-] Increasing timeout for get_ports calls to 120 seconds. Restart the agent to restore it to the default value.: MessagingTimeout: Timed out waiting for a reply to message ID 2bb5faa3ec8d4f5b9d3bd3e2fe095f9e
  2019-12-10 10:57:24.887 1838525 INFO eventlet.wsgi.server [-] 10.195.74.26,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200  len: 146 time: 0.9827850
  2019-12-10 10:57:24.903 1838518 INFO eventlet.wsgi.server [-] 10.195.74.43,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200  len: 146 time: 3.5630379
  2019-12-10 10:57:25.045 1838529 ERROR neutron.common.rpc [-] Timeout in RPC method get_ports. Waiting for 21 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: MessagingTimeout: Timed out waiting for a reply to message ID b38361bf9906482b8b24c5b534a6652b
  2019-12-10 10:57:25.046 1838529 WARNING neutron.common.rpc [-] Increasing timeout for get_ports calls to 120 seconds. Restart the agent to restore it to the default value.: MessagingTimeout: Timed out waiting for a reply to message ID b38361bf9906482b8b24c5b534a6652b
  2019-12-10 10:57:25.055 1838537 INFO eventlet.wsgi.server [-] 10.195.73.247,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200  len: 146 time: 0.7542410
  2019-12-10 10:57:25.119 1838523 INFO eventlet.wsgi.server [-] 10.195.74.2,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200  len: 146 time: 0.7057869
  2019-12-10 10:57:25.185 1838524 ERROR neutron.common.rpc [-] Timeout in RPC method get_ports. Waiting for 47 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: MessagingTimeout: Timed out waiting for a reply to message ID f1a268d937f94def97bd238916715744
  2019-12-10 10:57:25.261 1838529 ERROR neutron.common.rpc [-] Timeout in RPC method get_ports. Waiting for 26 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: MessagingTimeout: Timed out waiting for a reply to message ID 93c31cf4f5d34bd1a5ba90165e89cb79
  2019-12-10 10:57:25.284 1838536 INFO eventlet.wsgi.server [-] 10.195.73.207,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200  len: 146 time: 0.4315739
  2019-12-10 10:57:25.319 1838520 ERROR neutron.common.rpc [-] Timeout in RPC method get_ports. Waiting for 50 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: MessagingTimeout: Timed out waiting for a reply to message ID 37c1b168536e4c70b522c330209b11ec
  2019-12-10 10:57:25.319 1838520 WARNING neutron.common.rpc [-] Increasing timeout for get_ports calls to 120 seconds. Restart the agent to restore it to the default value.: MessagingTimeout: Timed out waiting for a reply to message ID 37c1b168536e4c70b522c330209b11ec
  2019-12-10 10:57:25.374 1838530 ERROR neutron.common.rpc [-] Timeout in RPC method get_ports. Waiting for 30 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: MessagingTimeout: Timed out waiting for a reply to message ID fb837fc73c664209bfbada0fb32886ad
  2019-12-10 10:57:25.375 1838530 WARNING neutron.common.rpc [-] Increasing timeout for get_ports calls to 120 seconds. Restart the agent to restore it to the default value.: MessagingTimeout: Timed out waiting for a reply to message ID fb837fc73c664209bfbada0fb32886ad
  2019-12-10 10:57:25.388 1838526 INFO eventlet.wsgi.server [-] 10.195.65.7,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200  len: 146 time: 3.5798080
  2019-12-10 10:57:25.446 1838520 INFO eventlet.wsgi.server [-] 10.195.74.104,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200  len: 146 time: 3.6868739
  2019-12-10 10:57:25.448 1838528 INFO eventlet.wsgi.server [-] 10.195.74.202,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200  len: 146 time: 3.7513518
  2019-12-10 10:57:25.452 1838519 WARNING oslo_messaging._drivers.amqpdriver [-] Number of call queues is 21, greater than warning threshold: 20. There could be a leak. Increasing threshold to: 40
  2019-12-10 10:57:25.504 1838535 ERROR neutron.common.rpc [-] Timeout in RPC method get_ports. Waiting for 15 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: MessagingTimeout: Timed out waiting for a reply to message ID 7b677a7d40274b0ea22510dcf3865cf6
  2019-12-10 10:57:25.505 1838535 WARNING neutron.common.rpc [-] Increasing timeout for get_ports calls to 120 seconds. Restart the agent to restore it to the default value.: MessagingTimeout: Timed out waiting for a reply to message ID 7b677a7d40274b0ea22510dcf3865cf6
  2019-12-10 10:57:25.609 1838539 ERROR neutron.common.rpc [-] Timeout in RPC method get_ports. Waiting for 20 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: MessagingTimeout: Timed out waiting for a reply to message ID 378f11ce14334be38ffaa95ec3fc26f2
  2019-12-10 10:57:25.610 1838539 WARNING neutron.common.rpc [-] Increasing timeout for get_ports calls to 120 seconds. Restart the agent to restore it to the default value.: MessagingTimeout: Timed out waiting for a reply to message ID 378f11ce14334be38ffaa95ec3fc26f2
  2019-12-10 10:57:25.661 1838524 ERROR neutron.common.rpc [-] Timeout in RPC method get_ports. Waiting for 28 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: MessagingTimeout: Timed out waiting for a reply to message ID 0c911a0ac95f42209cfa8b265d4d5c3d
  2019-12-10 10:57:25.787 1838525 INFO eventlet.wsgi.server [-] 10.195.74.86,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200  len: 146 time: 0.7191069
  2019-12-10 10:57:25.831 1838522 INFO eventlet.wsgi.server [-] 10.195.64.185,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200  len: 146 time: 0.5980189
  2019-12-10 10:57:25.837 1838532 ERROR neutron.common.rpc [-] Timeout in RPC method get_ports. Waiting for 51 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: MessagingTimeout: Timed out waiting for a reply to message ID 9a7cfd81ba714d2680aa223ba96798f0
  2019-12-10 10:57:25.837 1838532 WARNING neutron.common.rpc [-] Increasing timeout for get_ports calls to 120 seconds. Restart the agent to restore it to the default value.: MessagingTimeout: Timed out waiting for a reply to message ID 9a7cfd81ba714d2680aa223ba96798f0
  2019-12-10 10:57:25.903 1838536 ERROR neutron.common.rpc [-] Timeout in RPC method get_ports. Waiting for 28 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: MessagingTimeout: Timed out waiting for a reply to message ID da2628209cde4562bf47dc0bdfecbf1d
  2019-12-10 10:57:25.904 1838536 WARNING neutron.common.rpc [-] Increasing timeout for get_ports calls to 120 seconds. Restart the agent to restore it to the default value.: MessagingTimeout: Timed out waiting for a reply to message ID da2628209cde4562bf47dc0bdfecbf1d
  2019-12-10 10:57:25.914 1838521 INFO eventlet.wsgi.server [-] 10.195.74.44,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200  len: 146 time: 3.8841410
  2019-12-10 10:57:25.936 1838524 INFO eventlet.wsgi.server [-] 10.195.73.231,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200  len: 146 time: 0.3305228

  A minute or two after starting the server we get more errors. At this
  point VMs are unable to build. If I try to pull metadata from a VM get
  a 503 or 504 and nothing is logged in neutron-metadata-agent.log.
  Haproxy logs the 503/504 response.

  albertb@<html><body><h1>503:~ $ curl -s http://169.254.169.254/2009-04-04/meta-data/hostname
  <html><body><h1>503 Service Unavailable</h1>
  No server is available to handle this request.
  </body></html>

  Now the log is almost all errors:

  
  2019-12-10 10:57:27.666 1838530 INFO eventlet.wsgi.server [-] 10.195.73.174,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200  len: 146 time: 3.7101889
  2019-12-10 10:57:27.719 1838537 INFO eventlet.wsgi.server [-] 10.195.65.6,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200  len: 146 time: 0.3497119
  2019-12-10 10:57:27.720 1838525 ERROR neutron.common.rpc [-] Timeout in RPC method get_ports. Waiting for 60 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: MessagingTimeout: Timed out waiting for a reply to message ID 138877a7326e40f38de23b05fb97127a
  2019-12-10 10:57:27.720 1838525 WARNING neutron.common.rpc [-] Increasing timeout for get_ports calls to 120 seconds. Restart the agent to restore it to the default value.: MessagingTimeout: Timed out waiting for a reply to message ID 138877a7326e40f38de23b05fb97127a
  2019-12-10 10:57:27.741 1838523 INFO eventlet.wsgi.server [-] 10.195.74.86,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200  len: 146 time: 4.7329929
  2019-12-10 10:57:27.820 1838525 INFO eventlet.wsgi.server [-] 10.195.73.206,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200  len: 146 time: 1.4146030
  2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent [-] Unexpected error.: MessagingTimeout: Timed out waiting for a reply to message ID 2bb5faa3ec8d4f5b9d3bd3e2fe095f9e
  2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent Traceback (most recent call last):
  2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent   File "/usr/lib/python2.7/dist-packages/neutron/agent/metadata/agent.py", line 89, in __call__
  2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent     instance_id, tenant_id = self._get_instance_and_tenant_id(req)
  2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent   File "/usr/lib/python2.7/dist-packages/neutron/agent/metadata/agent.py", line 162, in _get_instance_and_tenant_id
  2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent     ports = self._get_ports(remote_address, network_id, router_id)
  2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent   File "/usr/lib/python2.7/dist-packages/neutron/agent/metadata/agent.py", line 155, in _get_ports
  2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent     return self._get_ports_for_remote_address(remote_address, networks)
  2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent   File "/usr/lib/python2.7/dist-packages/neutron/common/cache_utils.py", line 116, in __call__
  2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent     return self.func(target_self, *args, **kwargs)
  2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent   File "/usr/lib/python2.7/dist-packages/neutron/agent/metadata/agent.py", line 137, in _get_ports_for_remote_address
  2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent     ip_address=remote_address)
  2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent   File "/usr/lib/python2.7/dist-packages/neutron/agent/metadata/agent.py", line 106, in _get_ports_from_server
  2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent     return self.plugin_rpc.get_ports(self.context, filters)
  2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent   File "/usr/lib/python2.7/dist-packages/neutron/agent/metadata/agent.py", line 72, in get_ports
  2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent     return cctxt.call(context, 'get_ports', filters=filters)
  2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent   File "/usr/lib/python2.7/dist-packages/neutron/common/rpc.py", line 173, in call
  2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent     time.sleep(wait)
  2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent   File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
  2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent     self.force_reraise()
  2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent   File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
  2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent     six.reraise(self.type_, self.value, self.tb)
  2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent   File "/usr/lib/python2.7/dist-packages/neutron/common/rpc.py", line 150, in call
  2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent     return self._original_context.call(ctxt, method, **kwargs)
  2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent   File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 179, in call
  2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent     retry=self.retry)
  2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent   File "/usr/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 133, in _send
  2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent     retry=retry)
  2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent   File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 584, in send
  2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent     call_monitor_timeout, retry=retry)
  2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent   File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 573, in _send
  2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent     call_monitor_timeout)
  2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent   File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 459, in wait
  2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent     message = self.waiters.get(msg_id, timeout=timeout)
  2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent   File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 336, in get
  2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent     'to message ID %s' % msg_id)
  2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent MessagingTimeout: Timed out waiting for a reply to message ID 2bb5faa3ec8d4f5b9d3bd3e2fe095f9e
  2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent
  2019-12-10 10:57:27.828 1838524 INFO eventlet.wsgi.server [-] Traceback (most recent call last):
    File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 521, in handle_one_response
      write(b''.join(towrite))
    File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 462, in write
      wfile.flush()
    File "/usr/lib/python2.7/socket.py", line 307, in flush
      self._sock.sendall(view[write_offset:write_offset+buffer_size])
    File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 390, in sendall
      tail = self.send(data, flags)
    File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 384, in send
      return self._send_loop(self.fd.send, data, flags)
    File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 371, in _send_loop
      return send_method(data, *args)
  error: [Errno 32] Broken pipe

  2019-12-10 10:57:27.828 1838524 INFO eventlet.wsgi.server [-] 10.195.73.248,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 500  len: 0 time: 63.0060959
  2019-12-10 10:57:27.873 1838528 ERROR neutron.common.rpc [-] Timeout in RPC method get_ports. Waiting for 41 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: MessagingTimeout: Timed out waiting for a reply to message ID 7d73358e2fe841e4a4b818395e2e5b2d
  2019-12-10 10:57:27.877 1838524 INFO eventlet.wsgi.server [-] 10.195.73.238,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200  len: 146 time: 4.4559531
  2019-12-10 10:57:27.921 1838538 ERROR neutron.common.rpc [-] Timeout in RPC method get_ports. Waiting for 6 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: MessagingTimeout: Timed out waiting for a reply to message ID 297db0c16653413cabc868027f9e6abb
  2019-12-10 10:57:27.921 1838538 WARNING neutron.common.rpc [-] Increasing timeout for get_ports calls to 120 seconds. Restart the agent to restore it to the default value.: MessagingTimeout: Timed out waiting for a reply to message ID 297db0c16653413cabc868027f9e6abb
  2019-12-10 10:57:27.967 1838520 INFO eventlet.wsgi.server [-] 10.195.74.29,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200  len: 146 time: 0.4040241
  2019-12-10 10:57:28.006 1838517 INFO eventlet.wsgi.server [-] 10.195.74.202,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200  len: 146 time: 3.6681471
  2019-12-10 10:57:28.026 1838522 INFO eventlet.wsgi.server [-] 10.195.74.202,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200  len: 146 time: 0.3529530
  2019-12-10 10:57:28.058 1838519 INFO eventlet.wsgi.server [-] 10.195.74.121,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200  len: 146 time: 3.5390451

  
  To reproduce this issue:

  
  Build openstack cluster on Rocky and add 200 computes. 3 controllers with 48 CPU Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz and 92G RAM.

  
  This bug seems severe to us. It is ruining our production cluster and we cannot build VMs.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1855919/+subscriptions


References