← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1980967] [NEW] get_hypervisor_hostname helper function is failing silently

 

Public bug reported:

get_hypervisor_hostname() is raising an error but error is squashed[1] with 'pass'. This results with not getting a fully qualified domain name (fqdn).
I have only seen this issue happen with srio-agent.

Steps to Reproduce:
1.Start a srio-agent container with following sriov_agent.ini

[sriov_nic]
physical_device_mappings=datacentre:enp7s0f3,datacentre:enp5s0f0
resource_provider_bandwidths=enp7s0f3:10000000:10000000,enp5s0f0:10000000:10000000

2. Observe srio_agent log and notice that the agent starts without fqdn

INFO neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [-]
Resource provider hypervisors: {'enp7s0f3': 'computesriov-1',
'enp5s0f0': 'computesriov-1'}

Additional info:

I have root caused it by logging traceback and IOError.

2022-07-05 20:29:20.450 122133 DEBUG neutron.agent.common.utils [-] MIRO got error [Errno -2] Name or service not known get_hypervisor_hostname /usr/lib/python3.9/site-packages/neutron/agent/common/utils.py:104
2022-07-05 20:29:20.452 122133 DEBUG neutron.agent.common.utils [-] format_exc Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/eventlet/support/greendns.py", line 440, in resolve_cname
    ans = resolver.query(host, dns.rdatatype.CNAME)
  File "/usr/lib/python3.9/site-packages/eventlet/support/greendns.py", line 380, in query
    return end()
  File "/usr/lib/python3.9/site-packages/eventlet/support/greendns.py", line 359, in end
    raise result[1]
  File "/usr/lib/python3.9/site-packages/eventlet/support/greendns.py", line 340, in step
    a = fun(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/eventlet/dns/resolver.py", line 1002, in query
    raise NXDOMAIN(qnames=qnames_to_try, responses=nxdomain_responses)
eventlet.dns.resolver.NXDOMAIN: The DNS query name does not exist: computesriov-0.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/neutron/agent/common/utils.py", line 91, in get_hypervisor_hostname
    addrinfo = socket.getaddrinfo(host=hypervisor_hostname,
  File "/usr/lib/python3.9/site-packages/eventlet/support/greendns.py", line 540, in getaddrinfo
    qname = resolve_cname(qname).encode('ascii').decode('idna')
  File "/usr/lib/python3.9/site-packages/eventlet/support/greendns.py", line 446, in resolve_cname
    raise EAI_NODATA_ERROR
socket.gaierror: [Errno -2] Name or service not known

What is happening is that 'eventlet' module is doing some import
patching[2] causing socket.getattrinfo() to actually call into the
greendns.py:getattrinfo()[3] instead of python standard library _socket.
The greendns.py:getattrinfo is buggy and it seems to ignore looking up
fqdn in /etc/hosts (contains fqdn on the machine) first and goes
straight to querying DNS server which might not have this info (as in
this case).

This also explains why socket.getattrinfo() works just fine when in
Python terminal but fails when called within srio_agent python code.


[1] https://github.com/openstack/neutron/blob/ae87995a0827c98502bfa29a9abf9e3f229aac72/neutron/agent/common/utils.py#L84-95
[2] https://github.com/eventlet/eventlet/blob/v0.30.3/eventlet/support/greendns.py#L58
[3] https://github.com/eventlet/eventlet/blob/v0.30.3/eventlet/support/greendns.py#L508

** Affects: neutron
     Importance: Undecided
     Assignee: Miro Tomaska (mtomaska)
         Status: New

** Changed in: neutron
     Assignee: (unassigned) => Miro Tomaska (mtomaska)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1980967

Title:
  get_hypervisor_hostname helper function is failing silently

Status in neutron:
  New

Bug description:
  get_hypervisor_hostname() is raising an error but error is squashed[1] with 'pass'. This results with not getting a fully qualified domain name (fqdn).
  I have only seen this issue happen with srio-agent.

  Steps to Reproduce:
  1.Start a srio-agent container with following sriov_agent.ini

  [sriov_nic]
  physical_device_mappings=datacentre:enp7s0f3,datacentre:enp5s0f0
  resource_provider_bandwidths=enp7s0f3:10000000:10000000,enp5s0f0:10000000:10000000

  2. Observe srio_agent log and notice that the agent starts without
  fqdn

  INFO neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [-]
  Resource provider hypervisors: {'enp7s0f3': 'computesriov-1',
  'enp5s0f0': 'computesriov-1'}

  Additional info:

  I have root caused it by logging traceback and IOError.

  2022-07-05 20:29:20.450 122133 DEBUG neutron.agent.common.utils [-] MIRO got error [Errno -2] Name or service not known get_hypervisor_hostname /usr/lib/python3.9/site-packages/neutron/agent/common/utils.py:104
  2022-07-05 20:29:20.452 122133 DEBUG neutron.agent.common.utils [-] format_exc Traceback (most recent call last):
    File "/usr/lib/python3.9/site-packages/eventlet/support/greendns.py", line 440, in resolve_cname
      ans = resolver.query(host, dns.rdatatype.CNAME)
    File "/usr/lib/python3.9/site-packages/eventlet/support/greendns.py", line 380, in query
      return end()
    File "/usr/lib/python3.9/site-packages/eventlet/support/greendns.py", line 359, in end
      raise result[1]
    File "/usr/lib/python3.9/site-packages/eventlet/support/greendns.py", line 340, in step
      a = fun(*args, **kwargs)
    File "/usr/lib/python3.9/site-packages/eventlet/dns/resolver.py", line 1002, in query
      raise NXDOMAIN(qnames=qnames_to_try, responses=nxdomain_responses)
  eventlet.dns.resolver.NXDOMAIN: The DNS query name does not exist: computesriov-0.

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "/usr/lib/python3.9/site-packages/neutron/agent/common/utils.py", line 91, in get_hypervisor_hostname
      addrinfo = socket.getaddrinfo(host=hypervisor_hostname,
    File "/usr/lib/python3.9/site-packages/eventlet/support/greendns.py", line 540, in getaddrinfo
      qname = resolve_cname(qname).encode('ascii').decode('idna')
    File "/usr/lib/python3.9/site-packages/eventlet/support/greendns.py", line 446, in resolve_cname
      raise EAI_NODATA_ERROR
  socket.gaierror: [Errno -2] Name or service not known

  What is happening is that 'eventlet' module is doing some import
  patching[2] causing socket.getattrinfo() to actually call into the
  greendns.py:getattrinfo()[3] instead of python standard library
  _socket. The greendns.py:getattrinfo is buggy and it seems to ignore
  looking up fqdn in /etc/hosts (contains fqdn on the machine) first and
  goes straight to querying DNS server which might not have this info
  (as in this case).

  This also explains why socket.getattrinfo() works just fine when in
  Python terminal but fails when called within srio_agent python code.

  
  [1] https://github.com/openstack/neutron/blob/ae87995a0827c98502bfa29a9abf9e3f229aac72/neutron/agent/common/utils.py#L84-95
  [2] https://github.com/eventlet/eventlet/blob/v0.30.3/eventlet/support/greendns.py#L58
  [3] https://github.com/eventlet/eventlet/blob/v0.30.3/eventlet/support/greendns.py#L508

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1980967/+subscriptions



Follow ups