← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1531210] Re: Ovs agent loses OpenFlow rules if OVS gets restarted while Neutron is disconnected from SQL

 

*** This bug is a duplicate of bug 1439472 ***
    https://bugs.launchpad.net/bugs/1439472

** This bug has been marked a duplicate of bug 1439472
   OVS doesn't restart properly when Exception occurred

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1531210

Title:
  Ovs agent loses OpenFlow rules if OVS gets restarted while Neutron is
  disconnected from SQL

Status in neutron:
  Confirmed

Bug description:
  Flow to reproduce in Juno:

  1. Node X has neutron-ovs-agent running
  2. Neutron-server running ML2 plugin loses its connection to SQL server. At this point neutron-ovs-agent is not aware to this, since it doesn't query device properties.
  3. OVS is restarted in the background of the neutron-ovs-agent.
  4. The neutron-ovs-agent realizes that OVS was restarted since the CANARY VALUE it placed in OpenFlow table 23 is missing.
  5. The agent raises a local flag ovs_restarted and replaces the CANARY value to signal it took care of the OVS restart in this iteration.
  6. It runs through the OVS restart flow, which erases the OpenFlow rules (again, this is Juno). 
  7. When accessing the Neutron server to in process_network_ports() it the following SQL error which breaks this iteration:

  ########################################################################################
  2015-12-28 08:49:07,075.075 35862 ERROR neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-bea668e9-3c52-4535-a4f9-71a63dc538c4 None] process_network_ports - iteration:41940 - failure while retrieving port details from server
  2015-12-28 08:49:07,075.075 35862 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent Traceback (most recent call last):
  2015-12-28 08:49:07,075.075 35862 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/site-packages/neutron/plugins/openvswitch/agent/ovs_neutron_agent.py", line 1230, in process_network_ports
  2015-12-28 08:49:07,075.075 35862 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent     devices_added_updated, ovs_restarted)
  2015-12-28 08:49:07,075.075 35862 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/site-packages/neutron/plugins/openvswitch/agent/ovs_neutron_agent.py", line 1103, in treat_devices_added_or_updated
  2015-12-28 08:49:07,075.075 35862 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent     raise DeviceListRetrievalError(devices=devices, error=e)
  2015-12-28 08:49:07,075.075 35862 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent DeviceListRetrievalError: Unable to retrieve port details for devices: set([u'918890c7-cbfd-4a3f-bb2c-030e0f5ded5b', u'9c8c6b21-4baa-4c7e-b2ac-9772a7653da9', u'dded408d-65e7-4adf-8490-3ba78e1496b0', u'9aec4ec1-5921-40f5-8db7-fec3635511ce', u'545ba077-e2ab-434b-a696-bf0bc8874dcb', u'9a03a23c-2ae9-422c-a8da-2578134001bb', u'b62aa4db-819c-4941-a457-8c19a9897e66', u'a47ff11b-0c57-435e-ac5e-4348dccd6f0f', u'55defa8f-016f-46b1-b240-5825bc282571']) because of error: Remote error: OperationalError (_mysql_exceptions.OperationalError) (1047, 'WSREP has not yet prepared node for application use')
  2015-12-28 08:49:07,075.075 35862 TRACE neutron.plugins.openvswitch.agent.ovs_neutron_agent [u'Traceback (most recent call last):\n', u'  File "/usr/lib/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py", line 134, in _dispatch_and_reply\n    incoming.message))\n', u'  File "/usr/lib/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py", line 177, in _dispatch\n    return self._do_dispatch(endpoint, method, ctxt, args)\n', u'  File "/usr/lib/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py", line 123, in _do_dispatch\n    result = getattr(endpoint, method)(ctxt, **new_args)\n', u'  File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/rpc.py", line 115, in get_devices_details_list\n    for device in kwargs.pop(\'devices\', [])\n', u'  File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/rpc.py", line 92, in get_device_details\n    host)\n', u'  File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/plugin.py", line 1127, in update_port_status\n    updated = True\n', u'  File "/usr/lib64/python2.7/contextlib.py", line 24, in __exit__\n    self.gen.next()\n', u'  File "/usr/lib64/python2.7/contextlib.py", line 121, in nested\n    if exit(*exc):\n', u'  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 502, in __exit__\n    self.rollback()\n', u'  File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 63, in __exit__\n    compat.reraise(type_, value, traceback)\n', u'  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 502, in __exit__\n    self.rollback()\n', u'  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 423, in rollback\n    transaction._rollback_impl()\n', u'  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 461, in _rollback_impl\n    t[1].rollback()\n', u'  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1563, in rollback\n    self._do_rollback()\n', u'  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1601, in _do_rollback\n    self.connection._rollback_impl()\n', u'  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 670, in _rollback_impl\n    self._handle_dbapi_exception(e, None, None, None, None)\n', u'  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1334, in _handle_dbapi_exception\n    self._autorollback()\n', u'  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 791, in _autorollback\n    self._root._rollback_impl()\n', u'  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 670, in _rollback_impl\n    self._handle_dbapi_exception(e, None, None, None, None)\n', u'  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1266, in _handle_dbapi_exception\n    exc_info\n', u'  File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/compat.py", line 200, in raise_from_cause\n    reraise(type(exception), exception, tb=exc_tb)\n', u'  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 668, in _rollback_impl\n    self.engine.dialect.do_rollback(self.connection)\n', u'  File "/usr/lib64/python2.7/site-packages/sqlalchemy/dialects/mysql/base.py", line 2524, in do_rollback\n    dbapi_connection.rollback()\n', u"OperationalError: (_mysql_exceptions.OperationalError) (1047, 'WSREP has not yet prepared node for application use')\n"].

  ########################################################################################

  8. The error described in #7 happens till Neutron-server restores connection to SQL server.
  9. When SQL is restored  in the next iteration, the agent manages to get the ports data from server but it lost the ovs_restarted flag, which was in the scope of a previous iteration. Therefore it skips the  provision_local_vlan() and the OpenFlow rules are never retrieved.

  A possible solution is to put CANARY value in the end of the iteration
  that found it missing.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1531210/+subscriptions


References