← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1921085] Re: neutron-server ovsdbapp timeout exceptions after intermittent connectivity issues

 

thanks for the info

** Changed in: neutron
       Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1921085

Title:
  neutron-server ovsdbapp timeout exceptions after intermittent
  connectivity issues

Status in neutron:
  Invalid

Bug description:
  Cloud environment: bionic-ussuri with 3 neutron-server and 3 ovn-
  central components each running on separate rack. (ovn-central runs
  ovn-northd, ovsdb-nb, ovsdb-sb services)

  There is some network glitch between rack3 to other racks for a minute and so neutron-server/2 not able to communicate with ovn-central/0 and ovn-central/1. ovsdb-nb and ovsdb-sb leaders are on one of ovn-central/0 or ovn-central/1.
  However neutron-server/2 able to connect with ovn-central/2 ovndb-sb but its not a leader. 

  Logs from neutron-server on neutron-server/2 unit
  2021-02-15 14:20:08.119 15554 INFO ovsdbapp.backend.ovs_idl.vlog [req-a3778f18-7b4d-4739-b20a-bff355fed9b0 - - - - -] ssl:10.216.241.118:6641: clustered database server is disconnected from cluster; trying another server
  2021-02-15 14:20:08.121 15554 INFO ovsdbapp.backend.ovs_idl.vlog [req-a3778f18-7b4d-4739-b20a-bff355fed9b0 - - - - -] ssl:10.216.241.118:6641: connection closed by client
  2021-02-15 14:20:08.121 15554 INFO ovsdbapp.backend.ovs_idl.vlog [req-a3778f18-7b4d-4739-b20a-bff355fed9b0 - - - - -] ssl:10.216.241.118:6641: continuing to reconnect in the background but suppressing further logging
  2021-02-15 14:20:08.853 15553 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:10.216.241.118:16642: connected
  2021-02-15 14:20:08.864 15563 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:10.216.241.251:16642: connecting...
  2021-02-15 14:20:08.869 15542 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:10.216.241.251:16642: connecting...
  2021-02-15 14:20:08.872 15558 INFO ovsdbapp.backend.ovs_idl.vlog [req-c047c84e-8fdc-404c-8284-bba80c34fe90 - - - - -] ssl:10.216.241.251:16642: connecting...
  2021-02-15 14:20:08.877 15553 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:10.216.241.118:16642: clustered database server is disconnected from cluster; trying another server
  2021-02-15 14:20:08.879 15553 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:10.216.241.118:16642: connection closed by client
  2021-02-15 14:20:08.879 15553 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:10.216.241.118:16642: continuing to reconnect in the background but suppressing further logging
  2021-02-15 14:20:09.093 15548 INFO ovsdbapp.backend.ovs_idl.vlog [req-b3fb3d36-3477-454e-97e0-11673e64eff5 - - - - -] ssl:10.216.241.251:6641: connecting...
  2021-02-15 14:20:09.126 15558 INFO ovsdbapp.backend.ovs_idl.vlog [req-3de7f22d-c26c-493b-9463-3140898e35f0 - - - - -] ssl:10.216.241.251:6641: connecting...
  2021-02-15 14:20:09.129 15557 INFO ovsdbapp.backend.ovs_idl.vlog [req-89da4e64-10f9-45c1-ba11-c0ff429961c9 - - - - -] ssl:10.216.241.251:6641: connecting...
  2021-02-15 14:20:09.129 15571 INFO ovsdbapp.backend.ovs_idl.vlog [req-68cd67e7-592c-4869-bc11-2d18fc070c12 - - - - -] ssl:10.216.241.251:6641: connecting...
  2021-02-15 14:20:09.132 15563 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:10.216.241.251:6641: connecting...
  2021-02-15 14:20:10.284 15546 ERROR ovsdbapp.backend.ovs_idl.connection [-] (113, 'EHOSTUNREACH'): OpenSSL.SSL.SysCallError: (113, 'EHOSTUNREACH')
  ... (and more EHOSTUNREACH messages probably from each thread) 

  And I believe network connectivity is restored and then started seeing
  the Timeout exceptions to ovsdb. Any *_postcommit operations on
  neutron-server/2 got timed out.

  2021-02-15 15:17:21.163 15554 ERROR neutron.api.v2.resource [req-6b3381c3-69ac-44fc-b71d-a3110714f32e 84fca387fca043b984358c34174e1070 24471fcdff7e4cac9f7fe7b4ec0d04e3 - cb47060fffe34ed0a8913db979e06523 cb47060fffe34ed0a8913db979e06523] index failed: No details.: ovsdbapp.exceptions.TimeoutException: Commands [<neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.commands.CheckLivenessCommand object at 0x7f52e6d58c18>] exceeded timeout 180 seconds
  2021-02-15 16:03:18.018 15554 ERROR neutron.plugins.ml2.managers [req-3c4f2b06-2be3-4ccc-a00e-a91bf61b8473 - 6e3dac6cf8f14582be2c8a6fdc0a7458 - - -] Mechanism driver 'ovn' failed in create_port_postcommit: ovsdbapp.exceptions.TimeoutException: Commands [<neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.commands.AddLSwitchPortCommand object at 0x7f52e68154a8>, <ovsdbapp.schema.ovn_northbound.commands.PgAddPortCommand object at 0x7f52e6815e10>, <ovsdbapp.schema.ovn_northbound.commands.PgAddPortCommand object at 0x7f52e6815c18>, <ovsdbapp.schema.ovn_northbound.commands.PgAddPortCommand object at 0x7f52e6815b00>, <ovsdbapp.schema.ovn_northbound.commands.PgAddPortCommand object at 0x7f52e5f33048>, <ovsdbapp.schema.ovn_northbound.commands.PgAddPortCommand object at 0x7f52e5f33b00>, <ovsdbapp.schema.ovn_northbound.commands.PgAddPortCommand object at 0x7f52e5f33d68>, <ovsdbapp.schema.ovn_northbound.commands.PgAddPortCommand object at 0x7f52e5f337f0>, <ovsdbapp.schema.ovn_northbound.commands.PgAddPortCommand object at 0x7f52e5f33780>, <ovsdbapp.schema.ovn_northbound.commands.PgAddPortCommand object at 0x7f52e6efc390>, <ovsdbapp.schema.ovn_northbound.commands.QoSDelCommand object at 0x7f52e68157f0>, <ovsdbapp.schema.ovn_northbound.commands.QoSDelCommand object at 0x7f52e63d0b70>] exceeded timeout 180 seconds
  ...

  One reference of complete Timeout exception (points to Queue Full):
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource [req-9fa36c11-fcaf-4716-8371-3d4e357b5154 2ae54808a32e4ba6baec08cbc3df6cec 64f175c521c847c5a7d31a7443a861f2 - 8b226be7ba0a4e62a16072c0c08c6d8f 8b226be7ba0a4e62a16072c0c08c6d8f] index failed: No details.: ovsdbapp.exceptions.TimeoutException: Commands [<neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.commands.CheckLivenessCommand object at 0x7f52e6080c18>] exceeded timeout 180 seconds
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource Traceback (most recent call last):
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/ovsdbapp/backend/ovs_idl/connection.py", line 144, in queue_txn
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource     self.txns.put(txn, timeout=self.timeout)
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/ovsdbapp/backend/ovs_idl/connection.py", line 50, in put
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource     super(TransactionQueue, self).put(*args, **kwargs)
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/eventlet/queue.py", line 264, in put
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource     result = waiter.wait()
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/eventlet/queue.py", line 141, in wait
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource     return get_hub().switch()
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/eventlet/hubs/hub.py", line 298, in switch
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource     return self.greenlet.switch()
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource queue.Full
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource During handling of the above exception, another exception occurred:
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource Traceback (most recent call last):
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/neutron/api/v2/resource.py", line 98, in resource
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource     result = method(request=request, **args)
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 139, in wrapped
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource     setattr(e, '_RETRY_EXCEEDED', True)
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource     self.force_reraise()
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource     six.reraise(self.type_, self.value, self.tb)
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/six.py", line 703, in reraise
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource     raise value
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 135, in wrapped
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource     return f(*args, **kwargs)
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/oslo_db/api.py", line 154, in wrapper
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource     ectxt.value = e.inner_exc
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource     self.force_reraise()
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource     six.reraise(self.type_, self.value, self.tb)
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/six.py", line 703, in reraise
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource     raise value
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/oslo_db/api.py", line 142, in wrapper
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource     return f(*args, **kwargs)
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 183, in wrapped
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource     LOG.debug("Retry wrapper got retriable exception: %s", e)
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource     self.force_reraise()
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource     six.reraise(self.type_, self.value, self.tb)
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/six.py", line 703, in reraise
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource     raise value
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 179, in wrapped
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource     return f(*dup_args, **dup_kwargs)
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/neutron/api/v2/base.py", line 369, in index
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource     return self._items(request, True, parent_id)
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/neutron/api/v2/base.py", line 304, in _items
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource     obj_list = obj_getter(request.context, **kwargs)
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py", line 1090, in fn
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource     return op(results, new_method(*args, _driver=self, **kwargs))
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py", line 1157, in get_agents
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource     update_db = _driver.ping_all_chassis()
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource     txn.add(self._nb_ovn.check_liveness())
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/ovsdbapp/api.py", line 69, in __exit__
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource     self.result = self.commit()
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 52, in commit
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource     self.ovsdb_connection.queue_txn(self)
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource   File "/usr/lib/python3/dist-packages/ovsdbapp/backend/ovs_idl/connection.py", line 147, in queue_txn
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource     timeout=self.timeout)
  2021-02-15 14:58:44.610 15554 ERROR neutron.api.v2.resource ovsdbapp.exceptions.TimeoutException: Commands [<neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.commands.CheckLivenessCommand object at 0x7f52e6080c18>] exceeded timeout 180 seconds

  Expectation is the neutron-server/2 reconnects to the ovsdb and handles any transactions further.
   
  Problem is rectified after the restart of neutron-server service on neutron-server/2. 
  And the neutron/ovn db inconsistencies are cleared by running neutron-ovn-db-sync-util utility.

  
  Package versions:
  neutron: 2:16.2.0-0ubuntu1~cloud0
  python3-ovsdbapp: 1.1.0-0ubuntu1~cloud0
  python3-openvswitch: 2.13.1-0ubuntu0.20.04.1~cloud0
  ovn: 20.03.0-0ubuntu1~cloud0
  openvswitch: 2.13.1-0ubuntu0.20.04.1~cloud0

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1921085/+subscriptions


References