← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1866937] Re: Requests to neutron API do not use retries

 

Reviewed:  https://review.opendev.org/712226
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=0e34ed9733e3f23d162e3e428795807386abf1cb
Submitter: Zuul
Branch:    master

commit 0e34ed9733e3f23d162e3e428795807386abf1cb
Author: melanie witt <melwittt@xxxxxxxxx>
Date:   Wed Mar 11 02:26:52 2020 +0000

    Add config option for neutron client retries
    
    Nova can occasionally fail to carry out server actions which require
    calls to neutron API if haproxy happens to close a connection after
    idle time if an incoming request attempts to re-use the connection
    while it is being torn down.
    
    In order to be more resilient to this scenario, we can add a config
    option for neutron client to retry requests, similar to our existing
    CONF.cinder.http_retries and CONF.glance.num_retries options.
    
    Because we create our neutron client [1] using a keystoneauth1 session
    [2], we can set the 'connect_retries' keyword argument to let
    keystoneauth1 handle connection retries.
    
    Closes-Bug: #1866937
    
    [1] https://github.com/openstack/nova/blob/57459c3429ce62975cebd0cd70936785bdf2f3a4/nova/network/neutron.py#L226-L237
    [2] https://docs.openstack.org/keystoneauth/latest/api/keystoneauth1.session.html#keystoneauth1.session.Session
    
    Change-Id: Ifb3afb13aff7e103c2e80ade817d0e63b624604a


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1866937

Title:
  Requests to neutron API do not use retries

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  We have a customer bug report downstream [1] where nova occasionally
  fails to carry out server actions requiring calls to neutron API if
  haproxy happens to close a connection after idle time of 10 seconds at
  nearly the same time as an incoming request that attempts to re-use
  the connection while it is being torn down. Here is an excerpt from
  [1]:

   The result of our investigation, the cause is follows.

   1. neutron-client in nova uses connection pool ( urllib3/requests )
  for http.

   2. Sometimes, http connection is reused for different requests.

   3. Connection between neutron-client and haproxy is closed from
  haproxy when it is in idle for 10 seconds.

   4. If reusing connection from client side and closing connection from haproxy side are happend almost same time,
      client gets RST and end with "bad status line".

  To address this problem, we can add a new config option for neutron
  client (similar to the existing config options we have for cinder
  client and glance client retries) to be more resilient during such
  scenarios.

  [1] https://bugzilla.redhat.com/show_bug.cgi?id=1788853

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1866937/+subscriptions


References