yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #63229
[Bug 1680912] Re: Sometimes grenade job fails with NetworkNotFound because a network delete request took too long
** Changed in: neutron
Status: Confirmed => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1680912
Title:
Sometimes grenade job fails with NetworkNotFound because a network
delete request took too long
Status in neutron:
Fix Released
Bug description:
http://logs.openstack.org/38/453838/2/gate/gate-grenade-dsvm-neutron-
ubuntu-xenial/84cf5aa/logs/grenade.sh.txt.gz#_2017-04-07_03_45_19_220
The DELETE request in question is:
2017-04-07 03:45:19.220 | 2017-04-07 03:41:31,143 18539 WARNING [urllib3.connectionpool] Retrying (Retry(total=9, connect=None, read=None, redirect=5)) after connection broken by 'ReadTimeoutError("HTTPConnectionPool(host='149.202.181.79', port=9696): Read timed out. (read timeout=60)",)': /v2.0/networks/46b0776a-3917-440d-9b90-4ab02a735188
2017-04-07 03:45:19.220 | 2017-04-07 03:41:34,053 18539 INFO [tempest.lib.common.rest_client] Request (NetworksIpV6Test:_run_cleanups): 404 DELETE http://149.202.181.79:9696/v2.0/networks/46b0776a-3917-440d-9b90-4ab02a735188 62.919s
2017-04-07 03:45:19.220 | 2017-04-07 03:41:34,053 18539 DEBUG [tempest.lib.common.rest_client] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': '<omitted>'}
2017-04-07 03:45:19.220 | Body: None
2017-04-07 03:45:19.221 | Response - Headers: {u'content-length': '138', u'content-type': 'application/json', u'x-openstack-request-id': 'req-d1dfda92-9b9b-421f-b45f-7ef4746e43c6', u'date': 'Fri, 07 Apr 2017 03:41:34 GMT', u'connection': 'close', 'content-location': 'http://149.202.181.79:9696/v2.0/networks/46b0776a-3917-440d-9b90-4ab02a735188', 'status': '404'}
2017-04-07 03:45:19.221 | Body: {"NeutronError": {"message": "Network 46b0776a-3917-440d-9b90-4ab02a735188 could not be found.", "type": "NetworkNotFound", "detail": ""}}
What we see is first attempt to delete the network failed after 60
seconds, so we retry DELETE, at which point we see that the network is
no longer there.
In neutron-server log, we see that the first DELETE attempt was
received with req_id req-933bd8b3-cd8f-4d2f-8f0c-505f85347b9d at
03:40:31.344 but completed at 03:41:33.951 only (probably because
second DELETE attempt triggered something that unblocked the first
request).
In logs handling the first DELETE request, we see some looping de-
allocating ports:
2017-04-07 03:40:34.227 8785 DEBUG neutron.plugins.ml2.plugin [req-933bd8b3-cd8f-4d2f-8f0c-505f85347b9d tempest-NetworksIpV6Test-1590417059 tempest-NetworksIpV6Test-1590417059] Ports to auto-deallocate: set([u'b5ff06a6-4084-4961-b239-9f29aa42c68a']) delete_subnet /opt/stack/old/neutron/neutron/plugins/ml2/plugin.py:1093
2017-04-07 03:40:34.231 8785 DEBUG neutron.callbacks.manager [req-933bd8b3-cd8f-4d2f-8f0c-505f85347b9d tempest-NetworksIpV6Test-1590417059 tempest-NetworksIpV6Test-1590417059] Notify callbacks [] for subnet, before_delete _notify_loop /opt/stack/old/neutron/neutron/callbacks/manager.py:142
2017-04-07 03:40:34.338 8785 DEBUG neutron.plugins.ml2.plugin [req-933bd8b3-cd8f-4d2f-8f0c-505f85347b9d tempest-NetworksIpV6Test-1590417059 tempest-NetworksIpV6Test-1590417059] Ports to auto-deallocate: set([u'b5ff06a6-4084-4961-b239-9f29aa42c68a']) delete_subnet /opt/stack/old/neutron/neutron/plugins/ml2/plugin.py:1093
2017-04-07 03:40:34.340 8785 DEBUG neutron.callbacks.manager [req-933bd8b3-cd8f-4d2f-8f0c-505f85347b9d tempest-NetworksIpV6Test-1590417059 tempest-NetworksIpV6Test-1590417059] Notify callbacks [] for subnet, before_delete _notify_loop /opt/stack/old/neutron/neutron/callbacks/manager.py:142
It goes on like that on and on up until:
2017-04-07 03:41:32.644 8785 DEBUG neutron.plugins.ml2.plugin [req-933bd8b3-cd8f-4d2f-8f0c-505f85347b9d tempest-NetworksIpV6Test-1590417059 tempest-NetworksIpV6Test-1590417059] Deleting subnet record delete_subnet /opt/stack/old/neutron/neutron/plugins/ml2/plugin.py:1145
2017-04-07 03:41:32.644 8785 DEBUG neutron.ipam.driver [req-933bd8b3-cd8f-4d2f-8f0c-505f85347b9d tempest-NetworksIpV6Test-1590417059 tempest-NetworksIpV6Test-1590417059] Loading ipam driver: internal get_instance /opt/stack/old/neutron/neutron/ipam/driver.py:51
2017-04-07 03:41:32.698 8785 DEBUG neutron.plugins.ml2.plugin [req-933bd8b3-cd8f-4d2f-8f0c-505f85347b9d tempest-NetworksIpV6Test-1590417059 tempest-NetworksIpV6Test-1590417059] Committing transaction delete_subnet /opt/stack/old/neutron/neutron/plugins/ml2/plugin.py:1152
Right before the thread is unblocked and makes progress, we see there
is no longer a port to de-allocate:
2017-04-07 03:41:32.421 8785 DEBUG neutron.plugins.ml2.plugin [req-
933bd8b3-cd8f-4d2f-8f0c-505f85347b9d tempest-
NetworksIpV6Test-1590417059 tempest-NetworksIpV6Test-1590417059] Ports
to auto-deallocate: set([]) delete_subnet
/opt/stack/old/neutron/neutron/plugins/ml2/plugin.py:1093
I think this was unblocked by another cleanup (DELETE) request for the
port that happened just before:
2017-04-07 03:41:31.994 8785 DEBUG neutron.plugins.ml2.plugin [req-
d1dfda92-9b9b-421f-b45f-7ef4746e43c6 tempest-
NetworksIpV6Test-1590417059 tempest-NetworksIpV6Test-1590417059]
Calling delete_port for b5ff06a6-4084-4961-b239-9f29aa42c68a owned by
network:dhcp delete_port
/opt/stack/old/neutron/neutron/plugins/ml2/plugin.py:1637
I suspect this is related to the following patch where we first caught
the situation but landed the patch nevertheless:
https://review.openstack.org/#/q/I924fa7e36ea9e45bf0ef3480972341a851bda86c,n,z
We may want to revert those. We may also want to release a new Newton
release because the patch got into 9.3.0.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1680912/+subscriptions
References