← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1680912] [NEW] Sometimes grenade job fails with NetworkNotFound because a network delete request took too long

 

Public bug reported:

http://logs.openstack.org/38/453838/2/gate/gate-grenade-dsvm-neutron-
ubuntu-xenial/84cf5aa/logs/grenade.sh.txt.gz#_2017-04-07_03_45_19_220

The DELETE request in question is:
2017-04-07 03:45:19.220 |     2017-04-07 03:41:31,143 18539 WARNING  [urllib3.connectionpool] Retrying (Retry(total=9, connect=None, read=None, redirect=5)) after connection broken by 'ReadTimeoutError("HTTPConnectionPool(host='149.202.181.79', port=9696): Read timed out. (read timeout=60)",)': /v2.0/networks/46b0776a-3917-440d-9b90-4ab02a735188
2017-04-07 03:45:19.220 |     2017-04-07 03:41:34,053 18539 INFO     [tempest.lib.common.rest_client] Request (NetworksIpV6Test:_run_cleanups): 404 DELETE http://149.202.181.79:9696/v2.0/networks/46b0776a-3917-440d-9b90-4ab02a735188 62.919s
2017-04-07 03:45:19.220 |     2017-04-07 03:41:34,053 18539 DEBUG    [tempest.lib.common.rest_client] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': '<omitted>'}
2017-04-07 03:45:19.220 |             Body: None
2017-04-07 03:45:19.221 |         Response - Headers: {u'content-length': '138', u'content-type': 'application/json', u'x-openstack-request-id': 'req-d1dfda92-9b9b-421f-b45f-7ef4746e43c6', u'date': 'Fri, 07 Apr 2017 03:41:34 GMT', u'connection': 'close', 'content-location': 'http://149.202.181.79:9696/v2.0/networks/46b0776a-3917-440d-9b90-4ab02a735188', 'status': '404'}
2017-04-07 03:45:19.221 |             Body: {"NeutronError": {"message": "Network 46b0776a-3917-440d-9b90-4ab02a735188 could not be found.", "type": "NetworkNotFound", "detail": ""}}

What we see is first attempt to delete the network failed after 60
seconds, so we retry DELETE, at which point we see that the network is
no longer there.

In neutron-server log, we see that the first DELETE attempt was received
with req_id req-933bd8b3-cd8f-4d2f-8f0c-505f85347b9d at 03:40:31.344 but
completed at 03:41:33.951 only (probably because second DELETE attempt
triggered something that unblocked the first request).

In logs handling the first DELETE request, we see some looping de-
allocating ports:

2017-04-07 03:40:34.227 8785 DEBUG neutron.plugins.ml2.plugin [req-933bd8b3-cd8f-4d2f-8f0c-505f85347b9d tempest-NetworksIpV6Test-1590417059 tempest-NetworksIpV6Test-1590417059] Ports to auto-deallocate: set([u'b5ff06a6-4084-4961-b239-9f29aa42c68a']) delete_subnet /opt/stack/old/neutron/neutron/plugins/ml2/plugin.py:1093
2017-04-07 03:40:34.231 8785 DEBUG neutron.callbacks.manager [req-933bd8b3-cd8f-4d2f-8f0c-505f85347b9d tempest-NetworksIpV6Test-1590417059 tempest-NetworksIpV6Test-1590417059] Notify callbacks [] for subnet, before_delete _notify_loop /opt/stack/old/neutron/neutron/callbacks/manager.py:142
2017-04-07 03:40:34.338 8785 DEBUG neutron.plugins.ml2.plugin [req-933bd8b3-cd8f-4d2f-8f0c-505f85347b9d tempest-NetworksIpV6Test-1590417059 tempest-NetworksIpV6Test-1590417059] Ports to auto-deallocate: set([u'b5ff06a6-4084-4961-b239-9f29aa42c68a']) delete_subnet /opt/stack/old/neutron/neutron/plugins/ml2/plugin.py:1093
2017-04-07 03:40:34.340 8785 DEBUG neutron.callbacks.manager [req-933bd8b3-cd8f-4d2f-8f0c-505f85347b9d tempest-NetworksIpV6Test-1590417059 tempest-NetworksIpV6Test-1590417059] Notify callbacks [] for subnet, before_delete _notify_loop /opt/stack/old/neutron/neutron/callbacks/manager.py:142

It goes on like that on and on up until:

2017-04-07 03:41:32.644 8785 DEBUG neutron.plugins.ml2.plugin [req-933bd8b3-cd8f-4d2f-8f0c-505f85347b9d tempest-NetworksIpV6Test-1590417059 tempest-NetworksIpV6Test-1590417059] Deleting subnet record delete_subnet /opt/stack/old/neutron/neutron/plugins/ml2/plugin.py:1145
2017-04-07 03:41:32.644 8785 DEBUG neutron.ipam.driver [req-933bd8b3-cd8f-4d2f-8f0c-505f85347b9d tempest-NetworksIpV6Test-1590417059 tempest-NetworksIpV6Test-1590417059] Loading ipam driver: internal get_instance /opt/stack/old/neutron/neutron/ipam/driver.py:51
2017-04-07 03:41:32.698 8785 DEBUG neutron.plugins.ml2.plugin [req-933bd8b3-cd8f-4d2f-8f0c-505f85347b9d tempest-NetworksIpV6Test-1590417059 tempest-NetworksIpV6Test-1590417059] Committing transaction delete_subnet /opt/stack/old/neutron/neutron/plugins/ml2/plugin.py:1152

Right before the thread is unblocked and makes progress, we see there is
no longer a port to de-allocate:

2017-04-07 03:41:32.421 8785 DEBUG neutron.plugins.ml2.plugin [req-
933bd8b3-cd8f-4d2f-8f0c-505f85347b9d tempest-NetworksIpV6Test-1590417059
tempest-NetworksIpV6Test-1590417059] Ports to auto-deallocate: set([])
delete_subnet /opt/stack/old/neutron/neutron/plugins/ml2/plugin.py:1093

I think this was unblocked by another cleanup (DELETE) request for the
port that happened just before:

2017-04-07 03:41:31.994 8785 DEBUG neutron.plugins.ml2.plugin [req-
d1dfda92-9b9b-421f-b45f-7ef4746e43c6 tempest-NetworksIpV6Test-1590417059
tempest-NetworksIpV6Test-1590417059] Calling delete_port for
b5ff06a6-4084-4961-b239-9f29aa42c68a owned by network:dhcp delete_port
/opt/stack/old/neutron/neutron/plugins/ml2/plugin.py:1637

I suspect this is related to the following patch where we first caught
the situation but landed the patch nevertheless:
https://review.openstack.org/#/q/I924fa7e36ea9e45bf0ef3480972341a851bda86c,n,z

We may want to revert those. We may also want to release a new Newton
release because the patch got into 9.3.0.

** Affects: neutron
     Importance: Critical
     Assignee: Ihar Hrachyshka (ihar-hrachyshka)
         Status: Confirmed


** Tags: db gate-failure

** Changed in: neutron
   Importance: Undecided => Critical

** Tags added: gate-failure

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1680912

Title:
  Sometimes grenade job fails with NetworkNotFound because a network
  delete request took too long

Status in neutron:
  Confirmed

Bug description:
  http://logs.openstack.org/38/453838/2/gate/gate-grenade-dsvm-neutron-
  ubuntu-xenial/84cf5aa/logs/grenade.sh.txt.gz#_2017-04-07_03_45_19_220

  The DELETE request in question is:
  2017-04-07 03:45:19.220 |     2017-04-07 03:41:31,143 18539 WARNING  [urllib3.connectionpool] Retrying (Retry(total=9, connect=None, read=None, redirect=5)) after connection broken by 'ReadTimeoutError("HTTPConnectionPool(host='149.202.181.79', port=9696): Read timed out. (read timeout=60)",)': /v2.0/networks/46b0776a-3917-440d-9b90-4ab02a735188
  2017-04-07 03:45:19.220 |     2017-04-07 03:41:34,053 18539 INFO     [tempest.lib.common.rest_client] Request (NetworksIpV6Test:_run_cleanups): 404 DELETE http://149.202.181.79:9696/v2.0/networks/46b0776a-3917-440d-9b90-4ab02a735188 62.919s
  2017-04-07 03:45:19.220 |     2017-04-07 03:41:34,053 18539 DEBUG    [tempest.lib.common.rest_client] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': '<omitted>'}
  2017-04-07 03:45:19.220 |             Body: None
  2017-04-07 03:45:19.221 |         Response - Headers: {u'content-length': '138', u'content-type': 'application/json', u'x-openstack-request-id': 'req-d1dfda92-9b9b-421f-b45f-7ef4746e43c6', u'date': 'Fri, 07 Apr 2017 03:41:34 GMT', u'connection': 'close', 'content-location': 'http://149.202.181.79:9696/v2.0/networks/46b0776a-3917-440d-9b90-4ab02a735188', 'status': '404'}
  2017-04-07 03:45:19.221 |             Body: {"NeutronError": {"message": "Network 46b0776a-3917-440d-9b90-4ab02a735188 could not be found.", "type": "NetworkNotFound", "detail": ""}}

  What we see is first attempt to delete the network failed after 60
  seconds, so we retry DELETE, at which point we see that the network is
  no longer there.

  In neutron-server log, we see that the first DELETE attempt was
  received with req_id req-933bd8b3-cd8f-4d2f-8f0c-505f85347b9d at
  03:40:31.344 but completed at 03:41:33.951 only (probably because
  second DELETE attempt triggered something that unblocked the first
  request).

  In logs handling the first DELETE request, we see some looping de-
  allocating ports:

  2017-04-07 03:40:34.227 8785 DEBUG neutron.plugins.ml2.plugin [req-933bd8b3-cd8f-4d2f-8f0c-505f85347b9d tempest-NetworksIpV6Test-1590417059 tempest-NetworksIpV6Test-1590417059] Ports to auto-deallocate: set([u'b5ff06a6-4084-4961-b239-9f29aa42c68a']) delete_subnet /opt/stack/old/neutron/neutron/plugins/ml2/plugin.py:1093
  2017-04-07 03:40:34.231 8785 DEBUG neutron.callbacks.manager [req-933bd8b3-cd8f-4d2f-8f0c-505f85347b9d tempest-NetworksIpV6Test-1590417059 tempest-NetworksIpV6Test-1590417059] Notify callbacks [] for subnet, before_delete _notify_loop /opt/stack/old/neutron/neutron/callbacks/manager.py:142
  2017-04-07 03:40:34.338 8785 DEBUG neutron.plugins.ml2.plugin [req-933bd8b3-cd8f-4d2f-8f0c-505f85347b9d tempest-NetworksIpV6Test-1590417059 tempest-NetworksIpV6Test-1590417059] Ports to auto-deallocate: set([u'b5ff06a6-4084-4961-b239-9f29aa42c68a']) delete_subnet /opt/stack/old/neutron/neutron/plugins/ml2/plugin.py:1093
  2017-04-07 03:40:34.340 8785 DEBUG neutron.callbacks.manager [req-933bd8b3-cd8f-4d2f-8f0c-505f85347b9d tempest-NetworksIpV6Test-1590417059 tempest-NetworksIpV6Test-1590417059] Notify callbacks [] for subnet, before_delete _notify_loop /opt/stack/old/neutron/neutron/callbacks/manager.py:142

  It goes on like that on and on up until:

  2017-04-07 03:41:32.644 8785 DEBUG neutron.plugins.ml2.plugin [req-933bd8b3-cd8f-4d2f-8f0c-505f85347b9d tempest-NetworksIpV6Test-1590417059 tempest-NetworksIpV6Test-1590417059] Deleting subnet record delete_subnet /opt/stack/old/neutron/neutron/plugins/ml2/plugin.py:1145
  2017-04-07 03:41:32.644 8785 DEBUG neutron.ipam.driver [req-933bd8b3-cd8f-4d2f-8f0c-505f85347b9d tempest-NetworksIpV6Test-1590417059 tempest-NetworksIpV6Test-1590417059] Loading ipam driver: internal get_instance /opt/stack/old/neutron/neutron/ipam/driver.py:51
  2017-04-07 03:41:32.698 8785 DEBUG neutron.plugins.ml2.plugin [req-933bd8b3-cd8f-4d2f-8f0c-505f85347b9d tempest-NetworksIpV6Test-1590417059 tempest-NetworksIpV6Test-1590417059] Committing transaction delete_subnet /opt/stack/old/neutron/neutron/plugins/ml2/plugin.py:1152

  Right before the thread is unblocked and makes progress, we see there
  is no longer a port to de-allocate:

  2017-04-07 03:41:32.421 8785 DEBUG neutron.plugins.ml2.plugin [req-
  933bd8b3-cd8f-4d2f-8f0c-505f85347b9d tempest-
  NetworksIpV6Test-1590417059 tempest-NetworksIpV6Test-1590417059] Ports
  to auto-deallocate: set([]) delete_subnet
  /opt/stack/old/neutron/neutron/plugins/ml2/plugin.py:1093

  I think this was unblocked by another cleanup (DELETE) request for the
  port that happened just before:

  2017-04-07 03:41:31.994 8785 DEBUG neutron.plugins.ml2.plugin [req-
  d1dfda92-9b9b-421f-b45f-7ef4746e43c6 tempest-
  NetworksIpV6Test-1590417059 tempest-NetworksIpV6Test-1590417059]
  Calling delete_port for b5ff06a6-4084-4961-b239-9f29aa42c68a owned by
  network:dhcp delete_port
  /opt/stack/old/neutron/neutron/plugins/ml2/plugin.py:1637

  I suspect this is related to the following patch where we first caught
  the situation but landed the patch nevertheless:
  https://review.openstack.org/#/q/I924fa7e36ea9e45bf0ef3480972341a851bda86c,n,z

  We may want to revert those. We may also want to release a new Newton
  release because the patch got into 9.3.0.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1680912/+subscriptions


Follow ups