← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1410805] [NEW] NSX Plugin: UnAuthorizedRequest without retry

 

Public bug reported:

An occasional failure found in our production environment because of a
race condition in NSX plugin, which results in neutron port creation
failure. Here is the root cause.

When authentication cookie of NSX controller expires, it is expected that NSX plugin client will retry with new authentication cookie after re-login. The code logic is: when a connection is acquired, if cookie is None then it initiate a new login request for new cookie.
Otherwise, it continue to use old cookie to issue request, and if request failed because of authentication failure, it then checks if cookie is None then abort retrying.

However, there is a race condition that after connection is acquired but
before request is issued, there can be another thread failed to
authenticate and resetting the cookie (which is shared data between
connections to same controller) to None (so that during retry it will
know to re-login after acquiring the connection). So the first thread
will then abort retrying because of the logic checking the cookie and
aborting if the cookie is None.

This has high probability because there is a blocking operation after
connection is acquired: the sock.connect(). It is likely the cookie
being changed to None during this blocking operation.

The fix would be simply remove the abort logic, because even if the
cookie is None, it would then re-login and get a new cookie before
retry. Another fix could be moving the assignment of "cookie" before the
socket.connect() operation.

Related logs: 
(note that the log is from havana version, but the code logic is the same in upstream)
---------------------------------------------------------------------
2015-01-13 06:44:21,326 64499216     INFO [neutron.plugins.nicira.api_client.client] [5672] Connection https://xxx:443 idle for 1614.74 seconds; reconnecting.
2015-01-13 06:44:21,326 64499216    DEBUG [neutron.plugins.nicira.api_client.client] [5672] Acquired connection https://xxx:443. 14 connection(s) available.
2015-01-13 06:44:21,327 64499216    DEBUG [neutron.plugins.nicira.api_client.request] [5672] Issuing - request GET https://xxx:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552/lport/5589905e-e531-4676-b0a0-40fe5589e60f?relations=LogicalPortStatus
2015-01-13 06:44:21,389 64496976     INFO [neutron.plugins.nicira.api_client.client] [5673] Connection https://xxx:443 idle for 5833.87 seconds; reconnecting.
2015-01-13 06:44:21,389 64496976    DEBUG [neutron.plugins.nicira.api_client.client] [5673] Acquired connection https://xxx:443. 13 connection(s) available.
2015-01-13 06:44:21,390 64496976    DEBUG [neutron.plugins.nicira.api_client.request] [5673] Issuing - request GET https://xxx:443//ws.v1/lswitch/3ac1df8b-cdcb-472c-8638-75f89b87bfa4?relations=LogicalSwitchStatus
2015-01-13 06:44:21,394 64499216    DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '425260708'
2015-01-13 06:44:21,398 64499216    DEBUG [neutron.plugins.nicira.api_client.request] [5672] Completed request 'GET https://xxx:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552/lport/5589905e-e531-4676-b0a0-40fe5589e60f?relations=LogicalPortStatus': 401 (0.07 seconds)
2015-01-13 06:44:21,399 64499216    DEBUG [neutron.plugins.nicira.api_client.client] [5672] Released connection https://xxx:443. 14 connection(s) available.
2015-01-13 06:44:21,399 64499216    DEBUG [neutron.plugins.nicira.api_client.client] [5672] Acquired connection https://xxx:443. 13 connection(s) available.
2015-01-13 06:44:21,400 64496176    DEBUG [neutron.plugins.nicira.api_client.request] [5674] Issuing - request POST https://xxx:443//ws.v1/login
2015-01-13 06:44:21,400 64496176    DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '425260708'
2015-01-13 06:44:21,403 64496976    DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '425260708'
2015-01-13 06:44:21,439 64496976    DEBUG [neutron.plugins.nicira.api_client.request] [5673] Completed request 'GET https://xxx:443//ws.v1/lswitch/3ac1df8b-cdcb-472c-8638-75f89b87bfa4?relations=LogicalSwitchStatus': 401 (0.05 seconds)
2015-01-13 06:44:21,439 64496976    DEBUG [neutron.plugins.nicira.api_client.client] [5673] Released connection https://xxx:443. 14 connection(s) available.
2015-01-13 06:44:21,439 64496976    DEBUG [neutron.plugins.nicira.api_client.request_eventlet] [5673] Completed request 'GET /ws.v1/lswitch/3ac1df8b-cdcb-472c-8638-75f89b87bfa4?relations=LogicalSwitchStatus': 401
2015-01-13 06:44:21,439 57729424    ERROR [NeutronPlugin] An exception occured while selecting logical switch for the port
...
NvpPluginException: An unexpected error occurred in the NVP Plugin:An exception occured while selecting logical switch for the port

2015-01-13 06:44:21,584 64496176    DEBUG [neutron.plugins.nicira.api_client.request] [5674] Completed request 'POST https://xxx:443//ws.v1/login': 200 (0.18 seconds)
2015-01-13 06:44:21,585 64496176    DEBUG [neutron.plugins.nicira.api_client.request_eventlet] [5674] Completed request 'POST /ws.v1/login': 200 
2015-01-13 06:44:21,585 64499216    DEBUG [neutron.plugins.nicira.api_client.client_eventlet] Saving new authentication cookie 'nvp_sessionid=15d4602ff7fb3dc34c00901ef80a7cc4; Path=/; secure' 
2015-01-13 06:44:21,585 64499216    DEBUG [neutron.plugins.nicira.api_client.request] [5672] Issuing - request GET https://xxx:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552/lport/5589905e-e531-4676-b0a0-40fe5589e60f?relations=LogicalPortStatus
2015-01-13 06:44:21,585 64499216    DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '425260708'
2015-01-13 06:44:21,694 64499216    DEBUG [neutron.plugins.nicira.api_client.request] [5672] Completed request 'GET https://xxx:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552/lport/5589905e-e531-4676-b0a0-40fe5589e60f?relations=LogicalPortStatus': 200 (0.11 seconds)
2015-01-13 06:44:21,694 64499216    DEBUG [neutron.plugins.nicira.api_client.client] [5672] Released connection https://xxx:443. 15 connection(s) available.
2015-01-13 06:44:21,694 64499216    DEBUG [neutron.plugins.nicira.api_client.request_eventlet] [5672] Completed request 'GET /ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552/lport/5589905e-e531-4676-b0a0-40fe5589e60f?relations=LogicalPortStatus': 200

** Affects: neutron
     Importance: Undecided
         Status: New


** Tags: vmware

** Tags added: vmware

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1410805

Title:
  NSX Plugin: UnAuthorizedRequest without retry

Status in OpenStack Neutron (virtual network service):
  New

Bug description:
  An occasional failure found in our production environment because of a
  race condition in NSX plugin, which results in neutron port creation
  failure. Here is the root cause.

  When authentication cookie of NSX controller expires, it is expected that NSX plugin client will retry with new authentication cookie after re-login. The code logic is: when a connection is acquired, if cookie is None then it initiate a new login request for new cookie.
  Otherwise, it continue to use old cookie to issue request, and if request failed because of authentication failure, it then checks if cookie is None then abort retrying.

  However, there is a race condition that after connection is acquired
  but before request is issued, there can be another thread failed to
  authenticate and resetting the cookie (which is shared data between
  connections to same controller) to None (so that during retry it will
  know to re-login after acquiring the connection). So the first thread
  will then abort retrying because of the logic checking the cookie and
  aborting if the cookie is None.

  This has high probability because there is a blocking operation after
  connection is acquired: the sock.connect(). It is likely the cookie
  being changed to None during this blocking operation.

  The fix would be simply remove the abort logic, because even if the
  cookie is None, it would then re-login and get a new cookie before
  retry. Another fix could be moving the assignment of "cookie" before
  the socket.connect() operation.

  Related logs: 
  (note that the log is from havana version, but the code logic is the same in upstream)
  ---------------------------------------------------------------------
  2015-01-13 06:44:21,326 64499216     INFO [neutron.plugins.nicira.api_client.client] [5672] Connection https://xxx:443 idle for 1614.74 seconds; reconnecting.
  2015-01-13 06:44:21,326 64499216    DEBUG [neutron.plugins.nicira.api_client.client] [5672] Acquired connection https://xxx:443. 14 connection(s) available.
  2015-01-13 06:44:21,327 64499216    DEBUG [neutron.plugins.nicira.api_client.request] [5672] Issuing - request GET https://xxx:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552/lport/5589905e-e531-4676-b0a0-40fe5589e60f?relations=LogicalPortStatus
  2015-01-13 06:44:21,389 64496976     INFO [neutron.plugins.nicira.api_client.client] [5673] Connection https://xxx:443 idle for 5833.87 seconds; reconnecting.
  2015-01-13 06:44:21,389 64496976    DEBUG [neutron.plugins.nicira.api_client.client] [5673] Acquired connection https://xxx:443. 13 connection(s) available.
  2015-01-13 06:44:21,390 64496976    DEBUG [neutron.plugins.nicira.api_client.request] [5673] Issuing - request GET https://xxx:443//ws.v1/lswitch/3ac1df8b-cdcb-472c-8638-75f89b87bfa4?relations=LogicalSwitchStatus
  2015-01-13 06:44:21,394 64499216    DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '425260708'
  2015-01-13 06:44:21,398 64499216    DEBUG [neutron.plugins.nicira.api_client.request] [5672] Completed request 'GET https://xxx:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552/lport/5589905e-e531-4676-b0a0-40fe5589e60f?relations=LogicalPortStatus': 401 (0.07 seconds)
  2015-01-13 06:44:21,399 64499216    DEBUG [neutron.plugins.nicira.api_client.client] [5672] Released connection https://xxx:443. 14 connection(s) available.
  2015-01-13 06:44:21,399 64499216    DEBUG [neutron.plugins.nicira.api_client.client] [5672] Acquired connection https://xxx:443. 13 connection(s) available.
  2015-01-13 06:44:21,400 64496176    DEBUG [neutron.plugins.nicira.api_client.request] [5674] Issuing - request POST https://xxx:443//ws.v1/login
  2015-01-13 06:44:21,400 64496176    DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '425260708'
  2015-01-13 06:44:21,403 64496976    DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '425260708'
  2015-01-13 06:44:21,439 64496976    DEBUG [neutron.plugins.nicira.api_client.request] [5673] Completed request 'GET https://xxx:443//ws.v1/lswitch/3ac1df8b-cdcb-472c-8638-75f89b87bfa4?relations=LogicalSwitchStatus': 401 (0.05 seconds)
  2015-01-13 06:44:21,439 64496976    DEBUG [neutron.plugins.nicira.api_client.client] [5673] Released connection https://xxx:443. 14 connection(s) available.
  2015-01-13 06:44:21,439 64496976    DEBUG [neutron.plugins.nicira.api_client.request_eventlet] [5673] Completed request 'GET /ws.v1/lswitch/3ac1df8b-cdcb-472c-8638-75f89b87bfa4?relations=LogicalSwitchStatus': 401
  2015-01-13 06:44:21,439 57729424    ERROR [NeutronPlugin] An exception occured while selecting logical switch for the port
  ...
  NvpPluginException: An unexpected error occurred in the NVP Plugin:An exception occured while selecting logical switch for the port

  2015-01-13 06:44:21,584 64496176    DEBUG [neutron.plugins.nicira.api_client.request] [5674] Completed request 'POST https://xxx:443//ws.v1/login': 200 (0.18 seconds)
  2015-01-13 06:44:21,585 64496176    DEBUG [neutron.plugins.nicira.api_client.request_eventlet] [5674] Completed request 'POST /ws.v1/login': 200 
  2015-01-13 06:44:21,585 64499216    DEBUG [neutron.plugins.nicira.api_client.client_eventlet] Saving new authentication cookie 'nvp_sessionid=15d4602ff7fb3dc34c00901ef80a7cc4; Path=/; secure' 
  2015-01-13 06:44:21,585 64499216    DEBUG [neutron.plugins.nicira.api_client.request] [5672] Issuing - request GET https://xxx:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552/lport/5589905e-e531-4676-b0a0-40fe5589e60f?relations=LogicalPortStatus
  2015-01-13 06:44:21,585 64499216    DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '425260708'
  2015-01-13 06:44:21,694 64499216    DEBUG [neutron.plugins.nicira.api_client.request] [5672] Completed request 'GET https://xxx:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552/lport/5589905e-e531-4676-b0a0-40fe5589e60f?relations=LogicalPortStatus': 200 (0.11 seconds)
  2015-01-13 06:44:21,694 64499216    DEBUG [neutron.plugins.nicira.api_client.client] [5672] Released connection https://xxx:443. 15 connection(s) available.
  2015-01-13 06:44:21,694 64499216    DEBUG [neutron.plugins.nicira.api_client.request_eventlet] [5672] Completed request 'GET /ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552/lport/5589905e-e531-4676-b0a0-40fe5589e60f?relations=LogicalPortStatus': 200

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1410805/+subscriptions


Follow ups

References