← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2022321] [NEW] Using Isolated metadata+ipv6 haproxy metadata isn't working becasue haproxy container isn't created in some controlers

 

Public bug reported:

Keys and metadata info isn't loaded in the vms:

Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/tempest/lib/common/ssh.py", line 136, in _get_ssh_connection
    ssh.connect(self.host, port=self.port, username=self.username,
  File "/usr/lib/python3.9/site-packages/paramiko/client.py", line 406, in connect
    t.start_client(timeout=timeout)
  File "/usr/lib/python3.9/site-packages/paramiko/transport.py", line 699, in start_client
    raise e
  File "/usr/lib/python3.9/site-packages/paramiko/transport.py", line 2110, in run
    ptype, m = self.packetizer.read_message()
  File "/usr/lib/python3.9/site-packages/paramiko/packet.py", line 459, in read_message
    header = self.read_all(self.__block_size_in, check_rekey=True)
  File "/usr/lib/python3.9/site-packages/paramiko/packet.py", line 303, in read_all
    raise EOFError()
EOFError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/tempest/common/utils/__init__.py", line 70, in wrapper
    return f(*func_args, **func_kwargs)
  File "/usr/lib/python3.9/site-packages/tempest/scenario/test_network_basic_ops.py", line 535, in test_hotplug_nic
    self._check_public_network_connectivity(should_connect=True)
  File "/usr/lib/python3.9/site-packages/tempest/scenario/test_network_basic_ops.py", line 212, in _check_public_network_connectivity
    self.check_vm_connectivity(
  File "/usr/lib/python3.9/site-packages/tempest/scenario/manager.py", line 964, in check_vm_connectivity
    self.get_remote_client(ip_address, username, private_key,
  File "/usr/lib/python3.9/site-packages/tempest/scenario/manager.py", line 733, in get_remote_client
    linux_client.validate_authentication()
  File "/usr/lib/python3.9/site-packages/tempest/lib/common/utils/linux/remote_client.py", line 31, in wrapper
    return function(self, *args, **kwargs)
  File "/usr/lib/python3.9/site-packages/tempest/lib/common/utils/linux/remote_client.py", line 123, in validate_authentication
    self.ssh_client.test_connection_auth()
  File "/usr/lib/python3.9/site-packages/tempest/lib/common/ssh.py", line 245, in test_connection_auth
    connection = self._get_ssh_connection()
  File "/usr/lib/python3.9/site-packages/tempest/lib/common/ssh.py", line 155, in _get_ssh_connection
    raise exceptions.SSHTimeout(host=self.host,
tempest.lib.exceptions.SSHTimeout: Connection to the 10.0.0.190 via SSH timed out.
User: cirros, Password: None


The trigger of the problem is this patch:

https://review.opendev.org/c/openstack/neutron/+/876566/13/neutron/agent/metadata/driver.py


when Dad ipv6 error is detected haproxy isn't created due to the return in the line 269:


..........
                          'namespace': ns_name,
                          'network': network_id,
                          'exception': str(exc)})
                try:
                    ip_lib.delete_ip_address(bind_address_v6, bind_interface,
                                             namespace=ns_name)
                except Exception as exc:
                    # do not re-raise a delete failure, just log
                    LOG.info('Address deletion failure: %s', str(exc))
                return
        pm.enable()
.........


The problem needs that Dad error was detected in the controller is reported as metadata source because in this case without haproxy in this controller the metadata is unreachbable:

Dad error:

2023-05-31 14:27:40.140 79551 INFO neutron.agent.metadata.driver
[req-a76cfcdd-887b-4c36-86d5-a5eb2b87615c - - - - -] DAD failed for
address fe80::a9fe:a9fe on interface tapb07b4b7c-3b in namespace qdhcp-
abd16487-68bb-4090-8ccb-b6ec8a77cc2c on network
abd16487-68bb-4090-8ccb-b6ec8a77cc2c, deleting it. Exception: Failure
waiting for address fe80::a9fe:a9fe to become ready: Duplicate address
detected


haproxy doesn't start:

2023-05-31 14:27:39.461 79551 DEBUG neutron.agent.linux.utils [req-a76cfcdd-887b-4c36-86d5-a5eb2b87615c - - - - -] Unable to access /var/lib/neutron/external/pids/abd16487-68bb-4090-8ccb-b6ec8a77cc2c.pid.haproxy; Error: [Errno 2] No such file or directory: '/var/lib/neutron/external/pids/abd16487-68bb-4090-8ccb-b6ec8a77cc2c.pid.haproxy' get_value_from_file /usr/lib/python3.9/site-packages/neutron/agent/linux/utils.py:252
2023-05-31 14:27:39.462 79551 DEBUG neutron.agent.linux.utils [req-a76cfcdd-887b-4c36-86d5-a5eb2b87615c - - - - -] Unable to access /var/lib/neutron/external/pids/abd16487-68bb-4090-8ccb-b6ec8a77cc2c.pid.haproxy; Error: [Errno 2] No such file or directory: '/var/lib/neutron/external/pids/abd16487-68bb-4090-8ccb-b6ec8a77cc2c.pid.haproxy' get_value_from_file /usr/lib/python3.9/site-packages/neutron/agent/linux/utils.py:252
2023-05-31 14:27:39.463 79551 DEBUG neutron.agent.linux.external_process [req-a76cfcdd-887b-4c36-86d5-a5eb2b87615c - - - - -] No haproxy process started for abd16487-68bb-4090-8ccb-b6ec8a77cc2c disable /usr/lib/python3.9/site-packages/neutron/agent/linux/external_process.py:125
2023-05-31 14:27:39.463 79551 DEBUG neutron.agent.linux.utils [req-a76cfcdd-887b-4c36-86d5-a5eb2b87615c - - - - -] Unable to access /var/lib/neutron/external/pids/abd16487-68bb-4090-8ccb-b6ec8a77cc2c.pid.haproxy; Error: [Errno 2] No such file or directory: '/var/lib/neutron/external/pids/abd16487-68bb-4090-8ccb-b6ec8a77cc2c.pid.haproxy' get_value_from_file /usr/lib/python3.9/site-packages/neutron/agent/linux/utils.py:252


controller metadata ip :


ent': 'RTM_NEWADDR'}, {'family': 2, 'prefixlen': 28, 'flags': 128, 'scope': 0, 'index': 490, 'attrs': [['IFA_ADDRESS', '10.100.0.3'], ['IFA_LOCAL', '10.100.0.3'], ['IFA_BROADCAST', '10.100.0.15'], ['IFA_LABEL', 'tapb07b4b7c-3b'], ['IFA_FLAGS', 128], ['IFA_CACHEINFO', {'ifa_preferred': 4294967295, 'ifa_valid': 4294967295, 'cstamp': 815201, 'tstamp': 815201}]], 'header': {'length': 96, 'type': 20, 'flags': 2, 'sequence_number': 255, 'pid': 699746, 'error': None, 'target': 'qdhcp-abd16487-68bb-4090-8ccb-b6ec8a77cc2c', 'stats': (0, 0, 0)},


Error in vm : "ip-route:169.254.169.254 via 10.100.0.3 dev eth0 "

failed 14/20: up 34.50. request failed
failed 15/20: up 36.51. request failed
failed 16/20: up 38.53. request failed
failed 17/20: up 40.54. request failed
failed 18/20: up 42.56. request failed
failed 19/20: up 44.57. request failed
failed 20/20: up 46.59. request failed
failed to read iid from metadata. tried 20
failed to get instance-id of datasource
Top of dropbear init script
Starting dropbear sshd: failed to get instance-id of datasource
mkdir: can't create directory '/etc/dropbear': No such file or directory
WARN: generating key of type rsa failed!
WARN: generating key of type ecdsa failed!
OK
GROWROOT: CHANGED: partition=1 start=18432 old: size=210911 end=229343 new: size=2078687,end=2097119
/dev/root resized successfully [took 0.03s]
=== system information ===
Platform: Red Hat OpenStack Compute/RHEL
Container: none
Arch: x86_64
CPU(s): 1 @ 2199.996 MHz
Cores/Sockets/Threads: 1/1/1
Virt-type: VT-x
RAM Size: 100MB
Disks:
NAME  MAJ:MIN       SIZE LABEL         MOUNTPOINT
vda   252:0   1073741824               
vda1  252:1   1064287744 cirros-rootfs /
vda15 252:15     8388608               
=== sshd host keys ===
-----BEGIN SSH HOST KEY KEYS-----
Failed reading '/etc/dropbear/dropbear_rsa_host_key'
Failed reading '/etc/dropbear/dropbear_ecdsa_host_key'
-----END SSH HOST KEY KEYS-----
=== network info ===
if-info: lo,up,127.0.0.1,8,,
if-info: eth0,up,10.100.0.10,28,fe80::f816:3eff:fe6b:2f7a/64,
ip-route:default via 10.100.0.1 dev eth0 
ip-route:10.100.0.0/28 dev eth0 scope link  src 10.100.0.10 
ip-route:169.254.169.254 via 10.100.0.3 dev eth0 
ip-route6:fe80::/64 dev eth0  metric 256 
ip-route6:ff00::/8 dev eth0  metric 256 
=== datasource: None None ===
=== cirros: current=0.5.2 uptime=49.31 ===
  ____               ____  ____
 / __/ __ ____ ____ / __ \/ __/
/ /__ / // __// __// /_/ /\ \ 
\___//_//_/  /_/   \____/___/ 
   http://cirros-cloud.net


Haproxy started only in other contoller:


2023-05-31 14:27:38.844 81096 DEBUG neutron.agent.linux.utils [req-fbdd788d-0c45-4eaf-8f4c-43d3cf32c511 - - - - -] Unable to access /var/lib/neutron/external/pids/abd16487-68bb-4090-8ccb-b6ec8a77cc2c.pid.haproxy; Error: [Errno 2] No such file or directory: '/var/lib/neutron/external/pids/abd16487-68bb-4090-8ccb-b6ec8a77cc2c.pid.haproxy' get_value_from_file /usr/lib/python3.9/site-packages/neutron/agent/linux/utils.py:252
2023-05-31 14:27:38.846 81096 DEBUG neutron.agent.metadata.driver [req-fbdd788d-0c45-4eaf-8f4c-43d3cf32c511 - - - - -] haproxy_cfg = 
global
    log         /dev/log local0 debug
    log-tag     haproxy-metadata-proxy-abd16487-68bb-4090-8ccb-b6ec8a77cc2c
    user        neutron
    group       neutron
    maxconn     1024
    pidfile     /var/lib/neutron/external/pids/abd16487-68bb-4090-8ccb-b6ec8a77cc2c.pid.haproxy
    daemon

defaults
    log global
    mode http
    option httplog
    option dontlognull
    option http-server-close
    option forwardfor
    retries                 3
    timeout http-request    30s
    timeout connect         30s
    timeout client          32s
    timeout server          32s
    timeout http-keep-alive 30s

listen listener
    bind 169.254.169.254:80
    bind fe80::a9fe:a9fe:80 interface tap6834d3d5-02
    server metadata /var/lib/neutron/metadata_proxy
    http-request del-header X-Neutron-Router-ID
    http-request set-header X-Neutron-Network-ID abd16487-68bb-4090-8ccb-b6ec8a77cc2c
 create_config_file /usr/lib/python3.9/site-packages/neutron/agent/metadata/driver.py:162
2023-05-31 14:27:38.847 81096 DEBUG neutron.agent.linux.utils [req-fbdd788d-0c45-4eaf-8f4c-43d3cf32c511 - - - - -] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qdhcp-abd16487-68bb-4090-8ccb-b6ec8a77cc2c', 'haproxy', '-f', '/var/lib/neutron/ns-metadata-proxy/abd16487-68bb-4090-8ccb-b6ec8a77cc2c.conf'] execute_rootwrap_daemon /usr/lib/python3.9/site-packages/neutron/agent/linux/utils.py:108

** Affects: neutron
     Importance: Undecided
     Assignee: Rodolfo Alonso (rodolfo-alonso-hernandez)
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2022321

Title:
  Using Isolated metadata+ipv6 haproxy metadata isn't working becasue
  haproxy container isn't created in some controlers

Status in neutron:
  New

Bug description:
  Keys and metadata info isn't loaded in the vms:

  Traceback (most recent call last):
    File "/usr/lib/python3.9/site-packages/tempest/lib/common/ssh.py", line 136, in _get_ssh_connection
      ssh.connect(self.host, port=self.port, username=self.username,
    File "/usr/lib/python3.9/site-packages/paramiko/client.py", line 406, in connect
      t.start_client(timeout=timeout)
    File "/usr/lib/python3.9/site-packages/paramiko/transport.py", line 699, in start_client
      raise e
    File "/usr/lib/python3.9/site-packages/paramiko/transport.py", line 2110, in run
      ptype, m = self.packetizer.read_message()
    File "/usr/lib/python3.9/site-packages/paramiko/packet.py", line 459, in read_message
      header = self.read_all(self.__block_size_in, check_rekey=True)
    File "/usr/lib/python3.9/site-packages/paramiko/packet.py", line 303, in read_all
      raise EOFError()
  EOFError

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "/usr/lib/python3.9/site-packages/tempest/common/utils/__init__.py", line 70, in wrapper
      return f(*func_args, **func_kwargs)
    File "/usr/lib/python3.9/site-packages/tempest/scenario/test_network_basic_ops.py", line 535, in test_hotplug_nic
      self._check_public_network_connectivity(should_connect=True)
    File "/usr/lib/python3.9/site-packages/tempest/scenario/test_network_basic_ops.py", line 212, in _check_public_network_connectivity
      self.check_vm_connectivity(
    File "/usr/lib/python3.9/site-packages/tempest/scenario/manager.py", line 964, in check_vm_connectivity
      self.get_remote_client(ip_address, username, private_key,
    File "/usr/lib/python3.9/site-packages/tempest/scenario/manager.py", line 733, in get_remote_client
      linux_client.validate_authentication()
    File "/usr/lib/python3.9/site-packages/tempest/lib/common/utils/linux/remote_client.py", line 31, in wrapper
      return function(self, *args, **kwargs)
    File "/usr/lib/python3.9/site-packages/tempest/lib/common/utils/linux/remote_client.py", line 123, in validate_authentication
      self.ssh_client.test_connection_auth()
    File "/usr/lib/python3.9/site-packages/tempest/lib/common/ssh.py", line 245, in test_connection_auth
      connection = self._get_ssh_connection()
    File "/usr/lib/python3.9/site-packages/tempest/lib/common/ssh.py", line 155, in _get_ssh_connection
      raise exceptions.SSHTimeout(host=self.host,
  tempest.lib.exceptions.SSHTimeout: Connection to the 10.0.0.190 via SSH timed out.
  User: cirros, Password: None


  The trigger of the problem is this patch:

  https://review.opendev.org/c/openstack/neutron/+/876566/13/neutron/agent/metadata/driver.py

  
  when Dad ipv6 error is detected haproxy isn't created due to the return in the line 269:

  
  ..........
                            'namespace': ns_name,
                            'network': network_id,
                            'exception': str(exc)})
                  try:
                      ip_lib.delete_ip_address(bind_address_v6, bind_interface,
                                               namespace=ns_name)
                  except Exception as exc:
                      # do not re-raise a delete failure, just log
                      LOG.info('Address deletion failure: %s', str(exc))
                  return
          pm.enable()
  .........

  
  The problem needs that Dad error was detected in the controller is reported as metadata source because in this case without haproxy in this controller the metadata is unreachbable:

  Dad error:

  2023-05-31 14:27:40.140 79551 INFO neutron.agent.metadata.driver
  [req-a76cfcdd-887b-4c36-86d5-a5eb2b87615c - - - - -] DAD failed for
  address fe80::a9fe:a9fe on interface tapb07b4b7c-3b in namespace
  qdhcp-abd16487-68bb-4090-8ccb-b6ec8a77cc2c on network
  abd16487-68bb-4090-8ccb-b6ec8a77cc2c, deleting it. Exception: Failure
  waiting for address fe80::a9fe:a9fe to become ready: Duplicate address
  detected

  
  haproxy doesn't start:

  2023-05-31 14:27:39.461 79551 DEBUG neutron.agent.linux.utils [req-a76cfcdd-887b-4c36-86d5-a5eb2b87615c - - - - -] Unable to access /var/lib/neutron/external/pids/abd16487-68bb-4090-8ccb-b6ec8a77cc2c.pid.haproxy; Error: [Errno 2] No such file or directory: '/var/lib/neutron/external/pids/abd16487-68bb-4090-8ccb-b6ec8a77cc2c.pid.haproxy' get_value_from_file /usr/lib/python3.9/site-packages/neutron/agent/linux/utils.py:252
  2023-05-31 14:27:39.462 79551 DEBUG neutron.agent.linux.utils [req-a76cfcdd-887b-4c36-86d5-a5eb2b87615c - - - - -] Unable to access /var/lib/neutron/external/pids/abd16487-68bb-4090-8ccb-b6ec8a77cc2c.pid.haproxy; Error: [Errno 2] No such file or directory: '/var/lib/neutron/external/pids/abd16487-68bb-4090-8ccb-b6ec8a77cc2c.pid.haproxy' get_value_from_file /usr/lib/python3.9/site-packages/neutron/agent/linux/utils.py:252
  2023-05-31 14:27:39.463 79551 DEBUG neutron.agent.linux.external_process [req-a76cfcdd-887b-4c36-86d5-a5eb2b87615c - - - - -] No haproxy process started for abd16487-68bb-4090-8ccb-b6ec8a77cc2c disable /usr/lib/python3.9/site-packages/neutron/agent/linux/external_process.py:125
  2023-05-31 14:27:39.463 79551 DEBUG neutron.agent.linux.utils [req-a76cfcdd-887b-4c36-86d5-a5eb2b87615c - - - - -] Unable to access /var/lib/neutron/external/pids/abd16487-68bb-4090-8ccb-b6ec8a77cc2c.pid.haproxy; Error: [Errno 2] No such file or directory: '/var/lib/neutron/external/pids/abd16487-68bb-4090-8ccb-b6ec8a77cc2c.pid.haproxy' get_value_from_file /usr/lib/python3.9/site-packages/neutron/agent/linux/utils.py:252


  controller metadata ip :

  
  ent': 'RTM_NEWADDR'}, {'family': 2, 'prefixlen': 28, 'flags': 128, 'scope': 0, 'index': 490, 'attrs': [['IFA_ADDRESS', '10.100.0.3'], ['IFA_LOCAL', '10.100.0.3'], ['IFA_BROADCAST', '10.100.0.15'], ['IFA_LABEL', 'tapb07b4b7c-3b'], ['IFA_FLAGS', 128], ['IFA_CACHEINFO', {'ifa_preferred': 4294967295, 'ifa_valid': 4294967295, 'cstamp': 815201, 'tstamp': 815201}]], 'header': {'length': 96, 'type': 20, 'flags': 2, 'sequence_number': 255, 'pid': 699746, 'error': None, 'target': 'qdhcp-abd16487-68bb-4090-8ccb-b6ec8a77cc2c', 'stats': (0, 0, 0)},

  
  Error in vm : "ip-route:169.254.169.254 via 10.100.0.3 dev eth0 "

  failed 14/20: up 34.50. request failed
  failed 15/20: up 36.51. request failed
  failed 16/20: up 38.53. request failed
  failed 17/20: up 40.54. request failed
  failed 18/20: up 42.56. request failed
  failed 19/20: up 44.57. request failed
  failed 20/20: up 46.59. request failed
  failed to read iid from metadata. tried 20
  failed to get instance-id of datasource
  Top of dropbear init script
  Starting dropbear sshd: failed to get instance-id of datasource
  mkdir: can't create directory '/etc/dropbear': No such file or directory
  WARN: generating key of type rsa failed!
  WARN: generating key of type ecdsa failed!
  OK
  GROWROOT: CHANGED: partition=1 start=18432 old: size=210911 end=229343 new: size=2078687,end=2097119
  /dev/root resized successfully [took 0.03s]
  === system information ===
  Platform: Red Hat OpenStack Compute/RHEL
  Container: none
  Arch: x86_64
  CPU(s): 1 @ 2199.996 MHz
  Cores/Sockets/Threads: 1/1/1
  Virt-type: VT-x
  RAM Size: 100MB
  Disks:
  NAME  MAJ:MIN       SIZE LABEL         MOUNTPOINT
  vda   252:0   1073741824               
  vda1  252:1   1064287744 cirros-rootfs /
  vda15 252:15     8388608               
  === sshd host keys ===
  -----BEGIN SSH HOST KEY KEYS-----
  Failed reading '/etc/dropbear/dropbear_rsa_host_key'
  Failed reading '/etc/dropbear/dropbear_ecdsa_host_key'
  -----END SSH HOST KEY KEYS-----
  === network info ===
  if-info: lo,up,127.0.0.1,8,,
  if-info: eth0,up,10.100.0.10,28,fe80::f816:3eff:fe6b:2f7a/64,
  ip-route:default via 10.100.0.1 dev eth0 
  ip-route:10.100.0.0/28 dev eth0 scope link  src 10.100.0.10 
  ip-route:169.254.169.254 via 10.100.0.3 dev eth0 
  ip-route6:fe80::/64 dev eth0  metric 256 
  ip-route6:ff00::/8 dev eth0  metric 256 
  === datasource: None None ===
  === cirros: current=0.5.2 uptime=49.31 ===
    ____               ____  ____
   / __/ __ ____ ____ / __ \/ __/
  / /__ / // __// __// /_/ /\ \ 
  \___//_//_/  /_/   \____/___/ 
     http://cirros-cloud.net


  Haproxy started only in other contoller:

  
  2023-05-31 14:27:38.844 81096 DEBUG neutron.agent.linux.utils [req-fbdd788d-0c45-4eaf-8f4c-43d3cf32c511 - - - - -] Unable to access /var/lib/neutron/external/pids/abd16487-68bb-4090-8ccb-b6ec8a77cc2c.pid.haproxy; Error: [Errno 2] No such file or directory: '/var/lib/neutron/external/pids/abd16487-68bb-4090-8ccb-b6ec8a77cc2c.pid.haproxy' get_value_from_file /usr/lib/python3.9/site-packages/neutron/agent/linux/utils.py:252
  2023-05-31 14:27:38.846 81096 DEBUG neutron.agent.metadata.driver [req-fbdd788d-0c45-4eaf-8f4c-43d3cf32c511 - - - - -] haproxy_cfg = 
  global
      log         /dev/log local0 debug
      log-tag     haproxy-metadata-proxy-abd16487-68bb-4090-8ccb-b6ec8a77cc2c
      user        neutron
      group       neutron
      maxconn     1024
      pidfile     /var/lib/neutron/external/pids/abd16487-68bb-4090-8ccb-b6ec8a77cc2c.pid.haproxy
      daemon

  defaults
      log global
      mode http
      option httplog
      option dontlognull
      option http-server-close
      option forwardfor
      retries                 3
      timeout http-request    30s
      timeout connect         30s
      timeout client          32s
      timeout server          32s
      timeout http-keep-alive 30s

  listen listener
      bind 169.254.169.254:80
      bind fe80::a9fe:a9fe:80 interface tap6834d3d5-02
      server metadata /var/lib/neutron/metadata_proxy
      http-request del-header X-Neutron-Router-ID
      http-request set-header X-Neutron-Network-ID abd16487-68bb-4090-8ccb-b6ec8a77cc2c
   create_config_file /usr/lib/python3.9/site-packages/neutron/agent/metadata/driver.py:162
  2023-05-31 14:27:38.847 81096 DEBUG neutron.agent.linux.utils [req-fbdd788d-0c45-4eaf-8f4c-43d3cf32c511 - - - - -] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qdhcp-abd16487-68bb-4090-8ccb-b6ec8a77cc2c', 'haproxy', '-f', '/var/lib/neutron/ns-metadata-proxy/abd16487-68bb-4090-8ccb-b6ec8a77cc2c.conf'] execute_rootwrap_daemon /usr/lib/python3.9/site-packages/neutron/agent/linux/utils.py:108

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2022321/+subscriptions