← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1980967] Re: get_hypervisor_hostname helper function is failing silently

 

Reviewed:  https://review.opendev.org/c/openstack/neutron/+/849122
Committed: https://opendev.org/openstack/neutron/commit/ea223072841adc3fb88b840b5f8018bff60c8aa7
Submitter: "Zuul (22348)"
Branch:    master

commit ea223072841adc3fb88b840b5f8018bff60c8aa7
Author: Miro Tomaska <mtomaska@xxxxxxxxxx>
Date:   Fri Jul 8 09:56:23 2022 -0500

    Add workaround for eventlet.greendns bug
    
    Issue[1] workaround: A wrapper class which determines if socket module
    was eventlet patched and request std lib socket module instead.
    Also adding LOG.warning into the exception block so we dont miss
    issues like this in the future.
    
    Closes-Bug: #1980967
    Related-Bug: #1926693
    
    [1]https://github.com/eventlet/eventlet/issues/764
    
    Change-Id: I41c4cbc1aaea95f7808e6c6dca47ecd0402351c9


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1980967

Title:
  get_hypervisor_hostname helper function is failing silently

Status in neutron:
  Fix Released

Bug description:
  get_hypervisor_hostname() is raising an error but error is squashed[1] with 'pass'. This results with not getting a fully qualified domain name (fqdn).
  I have only seen this issue happen with srio-agent.

  Steps to Reproduce:
  1.Start a srio-agent container with following sriov_agent.ini

  [sriov_nic]
  physical_device_mappings=datacentre:enp7s0f3,datacentre:enp5s0f0
  resource_provider_bandwidths=enp7s0f3:10000000:10000000,enp5s0f0:10000000:10000000

  2. Observe srio_agent log and notice that the agent starts without
  fqdn

  INFO neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [-]
  Resource provider hypervisors: {'enp7s0f3': 'computesriov-1',
  'enp5s0f0': 'computesriov-1'}

  Additional info:

  I have root caused it by logging traceback and IOError.

  2022-07-05 20:29:20.450 122133 DEBUG neutron.agent.common.utils [-] MIRO got error [Errno -2] Name or service not known get_hypervisor_hostname /usr/lib/python3.9/site-packages/neutron/agent/common/utils.py:104
  2022-07-05 20:29:20.452 122133 DEBUG neutron.agent.common.utils [-] format_exc Traceback (most recent call last):
    File "/usr/lib/python3.9/site-packages/eventlet/support/greendns.py", line 440, in resolve_cname
      ans = resolver.query(host, dns.rdatatype.CNAME)
    File "/usr/lib/python3.9/site-packages/eventlet/support/greendns.py", line 380, in query
      return end()
    File "/usr/lib/python3.9/site-packages/eventlet/support/greendns.py", line 359, in end
      raise result[1]
    File "/usr/lib/python3.9/site-packages/eventlet/support/greendns.py", line 340, in step
      a = fun(*args, **kwargs)
    File "/usr/lib/python3.9/site-packages/eventlet/dns/resolver.py", line 1002, in query
      raise NXDOMAIN(qnames=qnames_to_try, responses=nxdomain_responses)
  eventlet.dns.resolver.NXDOMAIN: The DNS query name does not exist: computesriov-0.

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "/usr/lib/python3.9/site-packages/neutron/agent/common/utils.py", line 91, in get_hypervisor_hostname
      addrinfo = socket.getaddrinfo(host=hypervisor_hostname,
    File "/usr/lib/python3.9/site-packages/eventlet/support/greendns.py", line 540, in getaddrinfo
      qname = resolve_cname(qname).encode('ascii').decode('idna')
    File "/usr/lib/python3.9/site-packages/eventlet/support/greendns.py", line 446, in resolve_cname
      raise EAI_NODATA_ERROR
  socket.gaierror: [Errno -2] Name or service not known

  What is happening is that 'eventlet' module is doing some import
  patching[2] causing socket.getattrinfo() to actually call into the
  greendns.py:getattrinfo()[3] instead of python standard library
  _socket. The greendns.py:getattrinfo is buggy and it seems to ignore
  looking up fqdn in /etc/hosts (contains fqdn on the machine) first and
  goes straight to querying DNS server which might not have this info
  (as in this case).

  This also explains why socket.getattrinfo() works just fine when in
  Python terminal but fails when called within srio_agent python code.

  [1] https://github.com/openstack/neutron/blob/ae87995a0827c98502bfa29a9abf9e3f229aac72/neutron/agent/common/utils.py#L84-L95
  [2] https://github.com/eventlet/eventlet/blob/v0.30.3/eventlet/support/greendns.py#L58
  [3] https://github.com/eventlet/eventlet/blob/v0.30.3/eventlet/support/greendns.py#L508

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1980967/+subscriptions



References