yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #89262
[Bug 1980967] [NEW] get_hypervisor_hostname helper function is failing silently
Public bug reported:
get_hypervisor_hostname() is raising an error but error is squashed[1] with 'pass'. This results with not getting a fully qualified domain name (fqdn).
I have only seen this issue happen with srio-agent.
Steps to Reproduce:
1.Start a srio-agent container with following sriov_agent.ini
[sriov_nic]
physical_device_mappings=datacentre:enp7s0f3,datacentre:enp5s0f0
resource_provider_bandwidths=enp7s0f3:10000000:10000000,enp5s0f0:10000000:10000000
2. Observe srio_agent log and notice that the agent starts without fqdn
INFO neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [-]
Resource provider hypervisors: {'enp7s0f3': 'computesriov-1',
'enp5s0f0': 'computesriov-1'}
Additional info:
I have root caused it by logging traceback and IOError.
2022-07-05 20:29:20.450 122133 DEBUG neutron.agent.common.utils [-] MIRO got error [Errno -2] Name or service not known get_hypervisor_hostname /usr/lib/python3.9/site-packages/neutron/agent/common/utils.py:104
2022-07-05 20:29:20.452 122133 DEBUG neutron.agent.common.utils [-] format_exc Traceback (most recent call last):
File "/usr/lib/python3.9/site-packages/eventlet/support/greendns.py", line 440, in resolve_cname
ans = resolver.query(host, dns.rdatatype.CNAME)
File "/usr/lib/python3.9/site-packages/eventlet/support/greendns.py", line 380, in query
return end()
File "/usr/lib/python3.9/site-packages/eventlet/support/greendns.py", line 359, in end
raise result[1]
File "/usr/lib/python3.9/site-packages/eventlet/support/greendns.py", line 340, in step
a = fun(*args, **kwargs)
File "/usr/lib/python3.9/site-packages/eventlet/dns/resolver.py", line 1002, in query
raise NXDOMAIN(qnames=qnames_to_try, responses=nxdomain_responses)
eventlet.dns.resolver.NXDOMAIN: The DNS query name does not exist: computesriov-0.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.9/site-packages/neutron/agent/common/utils.py", line 91, in get_hypervisor_hostname
addrinfo = socket.getaddrinfo(host=hypervisor_hostname,
File "/usr/lib/python3.9/site-packages/eventlet/support/greendns.py", line 540, in getaddrinfo
qname = resolve_cname(qname).encode('ascii').decode('idna')
File "/usr/lib/python3.9/site-packages/eventlet/support/greendns.py", line 446, in resolve_cname
raise EAI_NODATA_ERROR
socket.gaierror: [Errno -2] Name or service not known
What is happening is that 'eventlet' module is doing some import
patching[2] causing socket.getattrinfo() to actually call into the
greendns.py:getattrinfo()[3] instead of python standard library _socket.
The greendns.py:getattrinfo is buggy and it seems to ignore looking up
fqdn in /etc/hosts (contains fqdn on the machine) first and goes
straight to querying DNS server which might not have this info (as in
this case).
This also explains why socket.getattrinfo() works just fine when in
Python terminal but fails when called within srio_agent python code.
[1] https://github.com/openstack/neutron/blob/ae87995a0827c98502bfa29a9abf9e3f229aac72/neutron/agent/common/utils.py#L84-95
[2] https://github.com/eventlet/eventlet/blob/v0.30.3/eventlet/support/greendns.py#L58
[3] https://github.com/eventlet/eventlet/blob/v0.30.3/eventlet/support/greendns.py#L508
** Affects: neutron
Importance: Undecided
Assignee: Miro Tomaska (mtomaska)
Status: New
** Changed in: neutron
Assignee: (unassigned) => Miro Tomaska (mtomaska)
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1980967
Title:
get_hypervisor_hostname helper function is failing silently
Status in neutron:
New
Bug description:
get_hypervisor_hostname() is raising an error but error is squashed[1] with 'pass'. This results with not getting a fully qualified domain name (fqdn).
I have only seen this issue happen with srio-agent.
Steps to Reproduce:
1.Start a srio-agent container with following sriov_agent.ini
[sriov_nic]
physical_device_mappings=datacentre:enp7s0f3,datacentre:enp5s0f0
resource_provider_bandwidths=enp7s0f3:10000000:10000000,enp5s0f0:10000000:10000000
2. Observe srio_agent log and notice that the agent starts without
fqdn
INFO neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [-]
Resource provider hypervisors: {'enp7s0f3': 'computesriov-1',
'enp5s0f0': 'computesriov-1'}
Additional info:
I have root caused it by logging traceback and IOError.
2022-07-05 20:29:20.450 122133 DEBUG neutron.agent.common.utils [-] MIRO got error [Errno -2] Name or service not known get_hypervisor_hostname /usr/lib/python3.9/site-packages/neutron/agent/common/utils.py:104
2022-07-05 20:29:20.452 122133 DEBUG neutron.agent.common.utils [-] format_exc Traceback (most recent call last):
File "/usr/lib/python3.9/site-packages/eventlet/support/greendns.py", line 440, in resolve_cname
ans = resolver.query(host, dns.rdatatype.CNAME)
File "/usr/lib/python3.9/site-packages/eventlet/support/greendns.py", line 380, in query
return end()
File "/usr/lib/python3.9/site-packages/eventlet/support/greendns.py", line 359, in end
raise result[1]
File "/usr/lib/python3.9/site-packages/eventlet/support/greendns.py", line 340, in step
a = fun(*args, **kwargs)
File "/usr/lib/python3.9/site-packages/eventlet/dns/resolver.py", line 1002, in query
raise NXDOMAIN(qnames=qnames_to_try, responses=nxdomain_responses)
eventlet.dns.resolver.NXDOMAIN: The DNS query name does not exist: computesriov-0.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.9/site-packages/neutron/agent/common/utils.py", line 91, in get_hypervisor_hostname
addrinfo = socket.getaddrinfo(host=hypervisor_hostname,
File "/usr/lib/python3.9/site-packages/eventlet/support/greendns.py", line 540, in getaddrinfo
qname = resolve_cname(qname).encode('ascii').decode('idna')
File "/usr/lib/python3.9/site-packages/eventlet/support/greendns.py", line 446, in resolve_cname
raise EAI_NODATA_ERROR
socket.gaierror: [Errno -2] Name or service not known
What is happening is that 'eventlet' module is doing some import
patching[2] causing socket.getattrinfo() to actually call into the
greendns.py:getattrinfo()[3] instead of python standard library
_socket. The greendns.py:getattrinfo is buggy and it seems to ignore
looking up fqdn in /etc/hosts (contains fqdn on the machine) first and
goes straight to querying DNS server which might not have this info
(as in this case).
This also explains why socket.getattrinfo() works just fine when in
Python terminal but fails when called within srio_agent python code.
[1] https://github.com/openstack/neutron/blob/ae87995a0827c98502bfa29a9abf9e3f229aac72/neutron/agent/common/utils.py#L84-95
[2] https://github.com/eventlet/eventlet/blob/v0.30.3/eventlet/support/greendns.py#L58
[3] https://github.com/eventlet/eventlet/blob/v0.30.3/eventlet/support/greendns.py#L508
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1980967/+subscriptions
Follow ups