yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #83857
[Bug 1895322] [NEW] Nova is not actually disabling greendns
Public bug reported:
Description
===========
In [1], we began disabling greendns in eventlet to fix bug 1164822. This
was done by setting the EVENTLET_NO_GREENDNS environment variable before
importing eventlet. At import time, eventlet uses this env variable to
enable/disable greendns [2]. Therefore, EVENTLET_NO_GREENDNS needs to be
set before importing eventlet. Patch [3] changed that, setting the env
var *after* importing eventlet, and thus re-enabling greendns in Nova.
Steps to reproduce
==================
-------------
Demonstration
-------------
This is a bit of a hard one to reproduce, but there's a simple way to
observe the necessity of setting the env var before importing eventlet:
*** Setting the environment variable BEFORE the import ***
--- Test script ---
[artom@zoe scratchpad]$ cat eventlet-test.py
import os
import socket
os.environ['EVENTLET_NO_GREENDNS'] = 'yes'
import eventlet
eventlet.monkey_patch()
socket.gethostbyname('fake.local')
--- Result: traceback DOES NOT include greendns.py (ie, environment
variable worked) ---
[artom@zoe scratchpad]$ python eventlet-test.py
Traceback (most recent call last):
File "eventlet-test.py", line 8, in <module>
socket.gethostbyname('fake.local')
socket.gaierror: [Errno -2] Name or service not known
*** Setting the environment variable AFTER the import ***
--- Test script ---
[artom@zoe scratchpad]$ cat eventlet-test.py
import os
import socket
import eventlet
os.environ['EVENTLET_NO_GREENDNS'] = 'yes'
eventlet.monkey_patch()
socket.gethostbyname('fake.local')
--- Result: traceback DOES include greendns.py (ie, environment variable
DID NOT work) ---
[artom@zoe scratchpad]$ python eventlet-test.py
Traceback (most recent call last):
File "/home/artom/.local/lib/python3.8/site-packages/eventlet/support/greendns.py", line 424, in resolve
return _proxy.query(name, rdtype, raise_on_no_answer=raises,
File "/home/artom/.local/lib/python3.8/site-packages/eventlet/support/greendns.py", line 380, in query
return end()
File "/home/artom/.local/lib/python3.8/site-packages/eventlet/support/greendns.py", line 359, in end
raise result[1]
File "/home/artom/.local/lib/python3.8/site-packages/eventlet/support/greendns.py", line 340, in step
a = fun(*args, **kwargs)
File "/home/artom/.local/lib/python3.8/site-packages/dns/resolver.py", line 1002, in query
raise NXDOMAIN(qnames=qnames_to_try, responses=nxdomain_responses)
dns.resolver.NXDOMAIN: None of DNS query names exist: fake.local., fake.local.redhat.com.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "eventlet-test.py", line 8, in <module>
socket.gethostbyname('fake.local')
File "/home/artom/.local/lib/python3.8/site-packages/eventlet/support/greendns.py", line 550, in gethostbyname
rrset = resolve(hostname)
File "/home/artom/.local/lib/python3.8/site-packages/eventlet/support/greendns.py", line 434, in resolve
raise EAI_NODATA_ERROR
socket.gaierror: [Errno -2] Name or service not known
----------------
Real life impact
----------------
Downstream in our openstack product, the return of eventlet has been
reported [4] as the cause of needless delays when failing over between
Rabbit URLs in case one of the Rabbit servers goes down. In our
deployment, DNS is not used, and IPv4 hostnames are just written to
/etc/hosts. Looks like greendns tries to to IPv6 resolution regardless,
needlessly using up 30 seconds in order to time out.
Expected result
===============
Almost-immediate failover to new Rabbit server.
Actual result
=============
greendns attempts IPv6 name resolution (though unclear of which Rabbit
server - the failed one, or the next one), before finally, after 30
seconds, connecting to the next Rabbit server.
Environment
===========
This has been reported on stable/train, but should be the same on
master.
References
==========
[1] https://review.opendev.org/#/c/26325/
[2] https://github.com/eventlet/eventlet/blob/v0.26.0/eventlet/green/socket.py#L20
[3] https://review.opendev.org/#/c/626952/
[4] https://bugzilla.redhat.com/show_bug.cgi?id=1860818
** Affects: nova
Importance: Undecided
Assignee: Artom Lifshitz (notartom)
Status: In Progress
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1895322
Title:
Nova is not actually disabling greendns
Status in OpenStack Compute (nova):
In Progress
Bug description:
Description
===========
In [1], we began disabling greendns in eventlet to fix bug 1164822.
This was done by setting the EVENTLET_NO_GREENDNS environment variable
before importing eventlet. At import time, eventlet uses this env
variable to enable/disable greendns [2]. Therefore,
EVENTLET_NO_GREENDNS needs to be set before importing eventlet. Patch
[3] changed that, setting the env var *after* importing eventlet, and
thus re-enabling greendns in Nova.
Steps to reproduce
==================
-------------
Demonstration
-------------
This is a bit of a hard one to reproduce, but there's a simple way to
observe the necessity of setting the env var before importing
eventlet:
*** Setting the environment variable BEFORE the import ***
--- Test script ---
[artom@zoe scratchpad]$ cat eventlet-test.py
import os
import socket
os.environ['EVENTLET_NO_GREENDNS'] = 'yes'
import eventlet
eventlet.monkey_patch()
socket.gethostbyname('fake.local')
--- Result: traceback DOES NOT include greendns.py (ie, environment
variable worked) ---
[artom@zoe scratchpad]$ python eventlet-test.py
Traceback (most recent call last):
File "eventlet-test.py", line 8, in <module>
socket.gethostbyname('fake.local')
socket.gaierror: [Errno -2] Name or service not known
*** Setting the environment variable AFTER the import ***
--- Test script ---
[artom@zoe scratchpad]$ cat eventlet-test.py
import os
import socket
import eventlet
os.environ['EVENTLET_NO_GREENDNS'] = 'yes'
eventlet.monkey_patch()
socket.gethostbyname('fake.local')
--- Result: traceback DOES include greendns.py (ie, environment
variable DID NOT work) ---
[artom@zoe scratchpad]$ python eventlet-test.py
Traceback (most recent call last):
File "/home/artom/.local/lib/python3.8/site-packages/eventlet/support/greendns.py", line 424, in resolve
return _proxy.query(name, rdtype, raise_on_no_answer=raises,
File "/home/artom/.local/lib/python3.8/site-packages/eventlet/support/greendns.py", line 380, in query
return end()
File "/home/artom/.local/lib/python3.8/site-packages/eventlet/support/greendns.py", line 359, in end
raise result[1]
File "/home/artom/.local/lib/python3.8/site-packages/eventlet/support/greendns.py", line 340, in step
a = fun(*args, **kwargs)
File "/home/artom/.local/lib/python3.8/site-packages/dns/resolver.py", line 1002, in query
raise NXDOMAIN(qnames=qnames_to_try, responses=nxdomain_responses)
dns.resolver.NXDOMAIN: None of DNS query names exist: fake.local., fake.local.redhat.com.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "eventlet-test.py", line 8, in <module>
socket.gethostbyname('fake.local')
File "/home/artom/.local/lib/python3.8/site-packages/eventlet/support/greendns.py", line 550, in gethostbyname
rrset = resolve(hostname)
File "/home/artom/.local/lib/python3.8/site-packages/eventlet/support/greendns.py", line 434, in resolve
raise EAI_NODATA_ERROR
socket.gaierror: [Errno -2] Name or service not known
----------------
Real life impact
----------------
Downstream in our openstack product, the return of eventlet has been
reported [4] as the cause of needless delays when failing over between
Rabbit URLs in case one of the Rabbit servers goes down. In our
deployment, DNS is not used, and IPv4 hostnames are just written to
/etc/hosts. Looks like greendns tries to to IPv6 resolution
regardless, needlessly using up 30 seconds in order to time out.
Expected result
===============
Almost-immediate failover to new Rabbit server.
Actual result
=============
greendns attempts IPv6 name resolution (though unclear of which Rabbit
server - the failed one, or the next one), before finally, after 30
seconds, connecting to the next Rabbit server.
Environment
===========
This has been reported on stable/train, but should be the same on
master.
References
==========
[1] https://review.opendev.org/#/c/26325/
[2] https://github.com/eventlet/eventlet/blob/v0.26.0/eventlet/green/socket.py#L20
[3] https://review.opendev.org/#/c/626952/
[4] https://bugzilla.redhat.com/show_bug.cgi?id=1860818
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1895322/+subscriptions
Follow ups