← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1895322] [NEW] Nova is not actually disabling greendns

 

Public bug reported:

Description
===========

In [1], we began disabling greendns in eventlet to fix bug 1164822. This
was done by setting the EVENTLET_NO_GREENDNS environment variable before
importing eventlet. At import time, eventlet uses this env variable to
enable/disable greendns [2]. Therefore, EVENTLET_NO_GREENDNS needs to be
set before importing eventlet. Patch [3] changed that, setting the env
var *after* importing eventlet, and thus re-enabling greendns in Nova.

Steps to reproduce
==================

-------------
Demonstration
-------------

This is a bit of a hard one to reproduce, but there's a simple way to
observe the necessity of setting the env var before importing eventlet:


*** Setting the environment variable BEFORE the import ***

--- Test script ---

[artom@zoe scratchpad]$ cat eventlet-test.py
import os
import socket

os.environ['EVENTLET_NO_GREENDNS'] = 'yes'
import eventlet

eventlet.monkey_patch()
socket.gethostbyname('fake.local')

--- Result: traceback DOES NOT include greendns.py (ie, environment
variable worked) ---

[artom@zoe scratchpad]$ python eventlet-test.py 
Traceback (most recent call last):
  File "eventlet-test.py", line 8, in <module>
    socket.gethostbyname('fake.local')
socket.gaierror: [Errno -2] Name or service not known

*** Setting the environment variable AFTER the import ***

--- Test script ---

[artom@zoe scratchpad]$ cat eventlet-test.py 
import os
import socket

import eventlet
os.environ['EVENTLET_NO_GREENDNS'] = 'yes'

eventlet.monkey_patch()
socket.gethostbyname('fake.local')

--- Result: traceback DOES include greendns.py (ie, environment variable
DID NOT work) ---

[artom@zoe scratchpad]$ python eventlet-test.py 
Traceback (most recent call last):
  File "/home/artom/.local/lib/python3.8/site-packages/eventlet/support/greendns.py", line 424, in resolve
    return _proxy.query(name, rdtype, raise_on_no_answer=raises,
  File "/home/artom/.local/lib/python3.8/site-packages/eventlet/support/greendns.py", line 380, in query
    return end()
  File "/home/artom/.local/lib/python3.8/site-packages/eventlet/support/greendns.py", line 359, in end
    raise result[1]
  File "/home/artom/.local/lib/python3.8/site-packages/eventlet/support/greendns.py", line 340, in step
    a = fun(*args, **kwargs)
  File "/home/artom/.local/lib/python3.8/site-packages/dns/resolver.py", line 1002, in query
    raise NXDOMAIN(qnames=qnames_to_try, responses=nxdomain_responses)
dns.resolver.NXDOMAIN: None of DNS query names exist: fake.local., fake.local.redhat.com.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "eventlet-test.py", line 8, in <module>
    socket.gethostbyname('fake.local')
  File "/home/artom/.local/lib/python3.8/site-packages/eventlet/support/greendns.py", line 550, in gethostbyname
    rrset = resolve(hostname)
  File "/home/artom/.local/lib/python3.8/site-packages/eventlet/support/greendns.py", line 434, in resolve
    raise EAI_NODATA_ERROR
socket.gaierror: [Errno -2] Name or service not known

----------------
Real life impact
----------------

Downstream in our openstack product, the return of eventlet has been
reported [4] as the cause of needless delays when failing over between
Rabbit URLs in case one of the Rabbit servers goes down. In our
deployment, DNS is not used, and IPv4 hostnames are just written to
/etc/hosts. Looks like greendns tries to to IPv6 resolution regardless,
needlessly using up 30 seconds in order to time out.

Expected result
===============

Almost-immediate failover to new Rabbit server.

Actual result
=============

greendns attempts IPv6 name resolution (though unclear of which Rabbit
server - the failed one, or the next one), before finally, after 30
seconds, connecting to the next Rabbit server.

Environment
===========

This has been reported on stable/train, but should be the same on
master.

References
==========

[1] https://review.opendev.org/#/c/26325/
[2] https://github.com/eventlet/eventlet/blob/v0.26.0/eventlet/green/socket.py#L20
[3] https://review.opendev.org/#/c/626952/
[4] https://bugzilla.redhat.com/show_bug.cgi?id=1860818

** Affects: nova
     Importance: Undecided
     Assignee: Artom Lifshitz (notartom)
         Status: In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1895322

Title:
  Nova is not actually disabling greendns

Status in OpenStack Compute (nova):
  In Progress

Bug description:
  Description
  ===========

  In [1], we began disabling greendns in eventlet to fix bug 1164822.
  This was done by setting the EVENTLET_NO_GREENDNS environment variable
  before importing eventlet. At import time, eventlet uses this env
  variable to enable/disable greendns [2]. Therefore,
  EVENTLET_NO_GREENDNS needs to be set before importing eventlet. Patch
  [3] changed that, setting the env var *after* importing eventlet, and
  thus re-enabling greendns in Nova.

  Steps to reproduce
  ==================

  -------------
  Demonstration
  -------------

  This is a bit of a hard one to reproduce, but there's a simple way to
  observe the necessity of setting the env var before importing
  eventlet:

  
  *** Setting the environment variable BEFORE the import ***

  --- Test script ---

  [artom@zoe scratchpad]$ cat eventlet-test.py
  import os
  import socket

  os.environ['EVENTLET_NO_GREENDNS'] = 'yes'
  import eventlet

  eventlet.monkey_patch()
  socket.gethostbyname('fake.local')

  --- Result: traceback DOES NOT include greendns.py (ie, environment
  variable worked) ---

  [artom@zoe scratchpad]$ python eventlet-test.py 
  Traceback (most recent call last):
    File "eventlet-test.py", line 8, in <module>
      socket.gethostbyname('fake.local')
  socket.gaierror: [Errno -2] Name or service not known

  *** Setting the environment variable AFTER the import ***

  --- Test script ---

  [artom@zoe scratchpad]$ cat eventlet-test.py 
  import os
  import socket

  import eventlet
  os.environ['EVENTLET_NO_GREENDNS'] = 'yes'

  eventlet.monkey_patch()
  socket.gethostbyname('fake.local')

  --- Result: traceback DOES include greendns.py (ie, environment
  variable DID NOT work) ---

  [artom@zoe scratchpad]$ python eventlet-test.py 
  Traceback (most recent call last):
    File "/home/artom/.local/lib/python3.8/site-packages/eventlet/support/greendns.py", line 424, in resolve
      return _proxy.query(name, rdtype, raise_on_no_answer=raises,
    File "/home/artom/.local/lib/python3.8/site-packages/eventlet/support/greendns.py", line 380, in query
      return end()
    File "/home/artom/.local/lib/python3.8/site-packages/eventlet/support/greendns.py", line 359, in end
      raise result[1]
    File "/home/artom/.local/lib/python3.8/site-packages/eventlet/support/greendns.py", line 340, in step
      a = fun(*args, **kwargs)
    File "/home/artom/.local/lib/python3.8/site-packages/dns/resolver.py", line 1002, in query
      raise NXDOMAIN(qnames=qnames_to_try, responses=nxdomain_responses)
  dns.resolver.NXDOMAIN: None of DNS query names exist: fake.local., fake.local.redhat.com.

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "eventlet-test.py", line 8, in <module>
      socket.gethostbyname('fake.local')
    File "/home/artom/.local/lib/python3.8/site-packages/eventlet/support/greendns.py", line 550, in gethostbyname
      rrset = resolve(hostname)
    File "/home/artom/.local/lib/python3.8/site-packages/eventlet/support/greendns.py", line 434, in resolve
      raise EAI_NODATA_ERROR
  socket.gaierror: [Errno -2] Name or service not known

  ----------------
  Real life impact
  ----------------

  Downstream in our openstack product, the return of eventlet has been
  reported [4] as the cause of needless delays when failing over between
  Rabbit URLs in case one of the Rabbit servers goes down. In our
  deployment, DNS is not used, and IPv4 hostnames are just written to
  /etc/hosts. Looks like greendns tries to to IPv6 resolution
  regardless, needlessly using up 30 seconds in order to time out.

  Expected result
  ===============

  Almost-immediate failover to new Rabbit server.

  Actual result
  =============

  greendns attempts IPv6 name resolution (though unclear of which Rabbit
  server - the failed one, or the next one), before finally, after 30
  seconds, connecting to the next Rabbit server.

  Environment
  ===========

  This has been reported on stable/train, but should be the same on
  master.

  References
  ==========

  [1] https://review.opendev.org/#/c/26325/
  [2] https://github.com/eventlet/eventlet/blob/v0.26.0/eventlet/green/socket.py#L20
  [3] https://review.opendev.org/#/c/626952/
  [4] https://bugzilla.redhat.com/show_bug.cgi?id=1860818

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1895322/+subscriptions


Follow ups