← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1863858] [NEW] socket.timeout error in dvr CI jobs cause SSH issues

 

Public bug reported:

It happens mostly in neutron-tempest-dvr job that random tests are
failing due to problems with SSH to the instance. Error is always like:

2020-02-18 18:24:34,987 22897 INFO     [tempest.lib.common.ssh] Creating ssh connection to '172.24.5.96:22' as 'cirros' with public key authentication
2020-02-18 18:25:35,048 22897 WARNING  [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.5.96 (timed out). Number attempts: 1. Retry after 2 seconds.
2020-02-18 18:26:37,609 22897 WARNING  [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.5.96 (timed out). Number attempts: 2. Retry after 3 seconds.
2020-02-18 18:27:41,173 22897 WARNING  [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.5.96 (timed out). Number attempts: 3. Retry after 4 seconds.
2020-02-18 18:28:45,701 22897 WARNING  [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.5.96 (timed out). Number attempts: 4. Retry after 5 seconds.
2020-02-18 18:29:51,265 22897 ERROR    [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.5.96 after 4 attempts
2020-02-18 18:29:51.265 22897 ERROR tempest.lib.common.ssh Traceback (most recent call last):
2020-02-18 18:29:51.265 22897 ERROR tempest.lib.common.ssh   File "/opt/stack/tempest/tempest/lib/common/ssh.py", line 107, in _get_ssh_connection
2020-02-18 18:29:51.265 22897 ERROR tempest.lib.common.ssh     sock=proxy_chan)
2020-02-18 18:29:51.265 22897 ERROR tempest.lib.common.ssh   File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/paramiko/client.py", line 349, in connect
2020-02-18 18:29:51.265 22897 ERROR tempest.lib.common.ssh     retry_on_signal(lambda: sock.connect(addr))
2020-02-18 18:29:51.265 22897 ERROR tempest.lib.common.ssh   File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/paramiko/util.py", line 283, in retry_on_signal
2020-02-18 18:29:51.265 22897 ERROR tempest.lib.common.ssh     return function()
2020-02-18 18:29:51.265 22897 ERROR tempest.lib.common.ssh   File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/paramiko/client.py", line 349, in <lambda>
2020-02-18 18:29:51.265 22897 ERROR tempest.lib.common.ssh     retry_on_signal(lambda: sock.connect(addr))
2020-02-18 18:29:51.265 22897 ERROR tempest.lib.common.ssh socket.timeout: timed out


And then at the end of the test:

Traceback (most recent call last):
  File "/opt/stack/tempest/tempest/lib/common/ssh.py", line 107, in _get_ssh_connection
    sock=proxy_chan)
  File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/paramiko/client.py", line 349, in connect
    retry_on_signal(lambda: sock.connect(addr))
  File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/paramiko/util.py", line 283, in retry_on_signal
    return function()
  File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/paramiko/client.py", line 349, in <lambda>
    retry_on_signal(lambda: sock.connect(addr))
socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/stack/tempest/tempest/common/utils/__init__.py", line 89, in wrapper
    return f(*func_args, **func_kwargs)
  File "/opt/stack/tempest/tempest/api/compute/servers/test_attach_interfaces.py", line 229, in test_create_list_show_delete_interfaces_by_network_port
    server, ifs = self._create_server_get_interfaces()
  File "/opt/stack/tempest/tempest/api/compute/servers/test_attach_interfaces.py", line 88, in _create_server_get_interfaces
    self._wait_for_validation(server, validation_resources)
  File "/opt/stack/tempest/tempest/api/compute/servers/test_attach_interfaces.py", line 73, in _wait_for_validation
    linux_client.validate_authentication()
  File "/opt/stack/tempest/tempest/lib/common/utils/linux/remote_client.py", line 60, in wrapper
    six.reraise(*original_exception)
  File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/six.py", line 703, in reraise
    raise value
  File "/opt/stack/tempest/tempest/lib/common/utils/linux/remote_client.py", line 33, in wrapper
    return function(self, *args, **kwargs)
  File "/opt/stack/tempest/tempest/lib/common/utils/linux/remote_client.py", line 116, in validate_authentication
    self.ssh_client.test_connection_auth()
  File "/opt/stack/tempest/tempest/lib/common/ssh.py", line 209, in test_connection_auth
    connection = self._get_ssh_connection()
  File "/opt/stack/tempest/tempest/lib/common/ssh.py", line 121, in _get_ssh_connection
    password=self.password)
tempest.lib.exceptions.SSHTimeout: Connection to the 172.24.5.96 via SSH timed out.
User: cirros, Password: password


>From console log it seems that fixed IP was properly configured on the instance and metadata service worked fine too.

** Affects: neutron
     Importance: Critical
         Status: Confirmed


** Tags: gate-failure l3-dvr-backlog tempest

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1863858

Title:
  socket.timeout error in dvr CI jobs cause SSH issues

Status in neutron:
  Confirmed

Bug description:
  It happens mostly in neutron-tempest-dvr job that random tests are
  failing due to problems with SSH to the instance. Error is always
  like:

  2020-02-18 18:24:34,987 22897 INFO     [tempest.lib.common.ssh] Creating ssh connection to '172.24.5.96:22' as 'cirros' with public key authentication
  2020-02-18 18:25:35,048 22897 WARNING  [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.5.96 (timed out). Number attempts: 1. Retry after 2 seconds.
  2020-02-18 18:26:37,609 22897 WARNING  [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.5.96 (timed out). Number attempts: 2. Retry after 3 seconds.
  2020-02-18 18:27:41,173 22897 WARNING  [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.5.96 (timed out). Number attempts: 3. Retry after 4 seconds.
  2020-02-18 18:28:45,701 22897 WARNING  [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.5.96 (timed out). Number attempts: 4. Retry after 5 seconds.
  2020-02-18 18:29:51,265 22897 ERROR    [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.5.96 after 4 attempts
  2020-02-18 18:29:51.265 22897 ERROR tempest.lib.common.ssh Traceback (most recent call last):
  2020-02-18 18:29:51.265 22897 ERROR tempest.lib.common.ssh   File "/opt/stack/tempest/tempest/lib/common/ssh.py", line 107, in _get_ssh_connection
  2020-02-18 18:29:51.265 22897 ERROR tempest.lib.common.ssh     sock=proxy_chan)
  2020-02-18 18:29:51.265 22897 ERROR tempest.lib.common.ssh   File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/paramiko/client.py", line 349, in connect
  2020-02-18 18:29:51.265 22897 ERROR tempest.lib.common.ssh     retry_on_signal(lambda: sock.connect(addr))
  2020-02-18 18:29:51.265 22897 ERROR tempest.lib.common.ssh   File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/paramiko/util.py", line 283, in retry_on_signal
  2020-02-18 18:29:51.265 22897 ERROR tempest.lib.common.ssh     return function()
  2020-02-18 18:29:51.265 22897 ERROR tempest.lib.common.ssh   File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/paramiko/client.py", line 349, in <lambda>
  2020-02-18 18:29:51.265 22897 ERROR tempest.lib.common.ssh     retry_on_signal(lambda: sock.connect(addr))
  2020-02-18 18:29:51.265 22897 ERROR tempest.lib.common.ssh socket.timeout: timed out

  
  And then at the end of the test:

  Traceback (most recent call last):
    File "/opt/stack/tempest/tempest/lib/common/ssh.py", line 107, in _get_ssh_connection
      sock=proxy_chan)
    File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/paramiko/client.py", line 349, in connect
      retry_on_signal(lambda: sock.connect(addr))
    File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/paramiko/util.py", line 283, in retry_on_signal
      return function()
    File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/paramiko/client.py", line 349, in <lambda>
      retry_on_signal(lambda: sock.connect(addr))
  socket.timeout: timed out

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "/opt/stack/tempest/tempest/common/utils/__init__.py", line 89, in wrapper
      return f(*func_args, **func_kwargs)
    File "/opt/stack/tempest/tempest/api/compute/servers/test_attach_interfaces.py", line 229, in test_create_list_show_delete_interfaces_by_network_port
      server, ifs = self._create_server_get_interfaces()
    File "/opt/stack/tempest/tempest/api/compute/servers/test_attach_interfaces.py", line 88, in _create_server_get_interfaces
      self._wait_for_validation(server, validation_resources)
    File "/opt/stack/tempest/tempest/api/compute/servers/test_attach_interfaces.py", line 73, in _wait_for_validation
      linux_client.validate_authentication()
    File "/opt/stack/tempest/tempest/lib/common/utils/linux/remote_client.py", line 60, in wrapper
      six.reraise(*original_exception)
    File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/six.py", line 703, in reraise
      raise value
    File "/opt/stack/tempest/tempest/lib/common/utils/linux/remote_client.py", line 33, in wrapper
      return function(self, *args, **kwargs)
    File "/opt/stack/tempest/tempest/lib/common/utils/linux/remote_client.py", line 116, in validate_authentication
      self.ssh_client.test_connection_auth()
    File "/opt/stack/tempest/tempest/lib/common/ssh.py", line 209, in test_connection_auth
      connection = self._get_ssh_connection()
    File "/opt/stack/tempest/tempest/lib/common/ssh.py", line 121, in _get_ssh_connection
      password=self.password)
  tempest.lib.exceptions.SSHTimeout: Connection to the 172.24.5.96 via SSH timed out.
  User: cirros, Password: password

  
  From console log it seems that fixed IP was properly configured on the instance and metadata service worked fine too.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1863858/+subscriptions