yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #23726
[Bug 1381094] [NEW] NSX plugin request retry should delay in case of "[Errno 104] Connection reset by peer"
Public bug reported:
When sending request to NSX controller, the TCP connection is reset by
remote (in our case, by Load-Balancer), which is a transient failure
(could be caused by LB ratelimit). The return code is [Errno 104]
Connection reset by peer.
In this case, retry should work. However, error 104 is returned
immediately after sending the request, and the retry logic in NSX plugin
doesn't delay at all, so continuous retries in a row failed.
>From the log below we can see all the 3 requests were performed and
failed in 1 ms.
2014-10-12 11:07:18,163 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Acquired connection https://os-nvp.vip.ppp01.corp.com:443. 14 connection(s) available.
2014-10-12 11:07:18,163 43793136 DEBUG [neutron.plugins.nicira.api_client.request] [13168] Issuing - request GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus
2014-10-12 11:07:18,163 43793136 DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '377462645'
2014-10-12 11:07:18,164 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Exception issuing request: [Errno 104] Connection reset by peer
2014-10-12 11:07:18,164 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Failed request 'GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus': '[Errno 104] Connection reset by peer' (0.00 seconds)
2014-10-12 11:07:18,164 43793136 WARNING [neutron.plugins.nicira.api_client.client] [13168] Connection returned in bad state, reconnecting to https://os-nvp.vip.ppp01.corp.com:443
2014-10-12 11:07:18,164 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Released connection https://os-nvp.vip.ppp01.corp.com:443. 15 connection(s) available.
2014-10-12 11:07:18,165 43793136 INFO [neutron.plugins.nicira.api_client.request_eventlet] [13168] Error while handling request: [Errno 104] Connection reset by peer
2014-10-12 11:07:18,165 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Acquired connection https://os-nvp.vip.ppp01.corp.com:443. 14 connection(s) available.
2014-10-12 11:07:18,165 43793136 DEBUG [neutron.plugins.nicira.api_client.request] [13168] Issuing - request GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus
2014-10-12 11:07:18,165 43793136 DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '377462645'
2014-10-12 11:07:18,166 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Exception issuing request: [Errno 104] Connection reset by peer
2014-10-12 11:07:18,166 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Failed request 'GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus': '[Errno 104] Connection reset by peer' (0.00 seconds)
2014-10-12 11:07:18,166 43793136 WARNING [neutron.plugins.nicira.api_client.client] [13168] Connection returned in bad state, reconnecting to https://os-nvp.vip.ppp01.corp.com:443
2014-10-12 11:07:18,166 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Released connection https://os-nvp.vip.ppp01.corp.com:443. 15 connection(s) available.
2014-10-12 11:07:18,166 43793136 INFO [neutron.plugins.nicira.api_client.request_eventlet] [13168] Error while handling request: [Errno 104] Connection reset by peer
2014-10-12 11:07:18,167 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Acquired connection https://os-nvp.vip.ppp01.corp.com:443. 14 connection(s) available.
2014-10-12 11:07:18,167 43793136 DEBUG [neutron.plugins.nicira.api_client.request] [13168] Issuing - request GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus
2014-10-12 11:07:18,167 43793136 DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '377462645'
2014-10-12 11:07:18,167 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Exception issuing request: [Errno 104] Connection reset by peer
2014-10-12 11:07:18,167 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Failed request 'GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus': '[Errno 104] Connection reset by peer' (0.00 seconds)
2014-10-12 11:07:18,168 43793136 WARNING [neutron.plugins.nicira.api_client.client] [13168] Connection returned in bad state, reconnecting to https://os-nvp.vip.ppp01.corp.com:443
2014-10-12 11:07:18,168 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Released connection https://os-nvp.vip.ppp01.corp.com:443. 15 connection(s) available.
2014-10-12 11:07:18,168 43793136 INFO [neutron.plugins.nicira.api_client.request_eventlet] [13168] Error while handling request: [Errno 104] Connection reset by peer
2014-10-12 11:07:18,168 133917104 ERROR [NVPApiHelper] Request timed out: GET to /ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus
2014-10-12 11:07:18,169 133917104 ERROR [NeutronPlugin] An exception occured while selecting logical switch for the port
...
Above log is from Havana. Below is my patch based on master branch to solve the issue:
diff --git a/neutron/plugins/vmware/api_client/eventlet_request.py b/neutron/plugins/vmware/api_client/eventlet_request.py
index c55ccfa..46232ea 100644
--- a/neutron/plugins/vmware/api_client/eventlet_request.py
+++ b/neutron/plugins/vmware/api_client/eventlet_request.py
@@ -17,6 +17,8 @@
import eventlet
import httplib
import urllib
+import socket
+import errno
from neutron.openstack.common import jsonutils
from neutron.openstack.common import log as logging
@@ -152,6 +154,9 @@ class EventletApiRequest(request.ApiRequest):
self._request_error = None
response = req
else:
+ if (isinstance(req, socket.error) and
+ req.errno == errno.ECONNRESET):
+ timeout = 0.5
LOG.info(_('[%(rid)d] Error while handling request: %(req)s'),
{'rid': self._rid(), 'req': req})
self._request_error = req
** Affects: neutron
Importance: Undecided
Status: New
** Tags: vmware
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1381094
Title:
NSX plugin request retry should delay in case of "[Errno 104]
Connection reset by peer"
Status in OpenStack Neutron (virtual network service):
New
Bug description:
When sending request to NSX controller, the TCP connection is reset by
remote (in our case, by Load-Balancer), which is a transient failure
(could be caused by LB ratelimit). The return code is [Errno 104]
Connection reset by peer.
In this case, retry should work. However, error 104 is returned
immediately after sending the request, and the retry logic in NSX
plugin doesn't delay at all, so continuous retries in a row failed.
From the log below we can see all the 3 requests were performed and
failed in 1 ms.
2014-10-12 11:07:18,163 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Acquired connection https://os-nvp.vip.ppp01.corp.com:443. 14 connection(s) available.
2014-10-12 11:07:18,163 43793136 DEBUG [neutron.plugins.nicira.api_client.request] [13168] Issuing - request GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus
2014-10-12 11:07:18,163 43793136 DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '377462645'
2014-10-12 11:07:18,164 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Exception issuing request: [Errno 104] Connection reset by peer
2014-10-12 11:07:18,164 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Failed request 'GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus': '[Errno 104] Connection reset by peer' (0.00 seconds)
2014-10-12 11:07:18,164 43793136 WARNING [neutron.plugins.nicira.api_client.client] [13168] Connection returned in bad state, reconnecting to https://os-nvp.vip.ppp01.corp.com:443
2014-10-12 11:07:18,164 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Released connection https://os-nvp.vip.ppp01.corp.com:443. 15 connection(s) available.
2014-10-12 11:07:18,165 43793136 INFO [neutron.plugins.nicira.api_client.request_eventlet] [13168] Error while handling request: [Errno 104] Connection reset by peer
2014-10-12 11:07:18,165 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Acquired connection https://os-nvp.vip.ppp01.corp.com:443. 14 connection(s) available.
2014-10-12 11:07:18,165 43793136 DEBUG [neutron.plugins.nicira.api_client.request] [13168] Issuing - request GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus
2014-10-12 11:07:18,165 43793136 DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '377462645'
2014-10-12 11:07:18,166 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Exception issuing request: [Errno 104] Connection reset by peer
2014-10-12 11:07:18,166 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Failed request 'GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus': '[Errno 104] Connection reset by peer' (0.00 seconds)
2014-10-12 11:07:18,166 43793136 WARNING [neutron.plugins.nicira.api_client.client] [13168] Connection returned in bad state, reconnecting to https://os-nvp.vip.ppp01.corp.com:443
2014-10-12 11:07:18,166 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Released connection https://os-nvp.vip.ppp01.corp.com:443. 15 connection(s) available.
2014-10-12 11:07:18,166 43793136 INFO [neutron.plugins.nicira.api_client.request_eventlet] [13168] Error while handling request: [Errno 104] Connection reset by peer
2014-10-12 11:07:18,167 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Acquired connection https://os-nvp.vip.ppp01.corp.com:443. 14 connection(s) available.
2014-10-12 11:07:18,167 43793136 DEBUG [neutron.plugins.nicira.api_client.request] [13168] Issuing - request GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus
2014-10-12 11:07:18,167 43793136 DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '377462645'
2014-10-12 11:07:18,167 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Exception issuing request: [Errno 104] Connection reset by peer
2014-10-12 11:07:18,167 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Failed request 'GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus': '[Errno 104] Connection reset by peer' (0.00 seconds)
2014-10-12 11:07:18,168 43793136 WARNING [neutron.plugins.nicira.api_client.client] [13168] Connection returned in bad state, reconnecting to https://os-nvp.vip.ppp01.corp.com:443
2014-10-12 11:07:18,168 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Released connection https://os-nvp.vip.ppp01.corp.com:443. 15 connection(s) available.
2014-10-12 11:07:18,168 43793136 INFO [neutron.plugins.nicira.api_client.request_eventlet] [13168] Error while handling request: [Errno 104] Connection reset by peer
2014-10-12 11:07:18,168 133917104 ERROR [NVPApiHelper] Request timed out: GET to /ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus
2014-10-12 11:07:18,169 133917104 ERROR [NeutronPlugin] An exception occured while selecting logical switch for the port
...
Above log is from Havana. Below is my patch based on master branch to solve the issue:
diff --git a/neutron/plugins/vmware/api_client/eventlet_request.py b/neutron/plugins/vmware/api_client/eventlet_request.py
index c55ccfa..46232ea 100644
--- a/neutron/plugins/vmware/api_client/eventlet_request.py
+++ b/neutron/plugins/vmware/api_client/eventlet_request.py
@@ -17,6 +17,8 @@
import eventlet
import httplib
import urllib
+import socket
+import errno
from neutron.openstack.common import jsonutils
from neutron.openstack.common import log as logging
@@ -152,6 +154,9 @@ class EventletApiRequest(request.ApiRequest):
self._request_error = None
response = req
else:
+ if (isinstance(req, socket.error) and
+ req.errno == errno.ECONNRESET):
+ timeout = 0.5
LOG.info(_('[%(rid)d] Error while handling request: %(req)s'),
{'rid': self._rid(), 'req': req})
self._request_error = req
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1381094/+subscriptions
Follow ups
References