← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2000378] [NEW] [OVN] orphaned virtual parent ports break new ports

 

Public bug reported:

Reproducible on stable/yoga.


Should the ovn port deletion fail due to backend (mariadb or ovn) connection failure, leftover switchports are left hanging in the OVN NB db.

oslo_db.exception.DBDeadlock: (pymysql.err.OperationalError) (1205, 'Lock wait timeout exceeded; try restarting transaction')
[SQL: DELETE FROM securitygroupportbindings WHERE securitygroupportbindings.port_id = %(port_id)s AND securitygroupportbindings.security_group_id = %(security_group_id)s]
[parameters: {'port_id': '76ff3324-7326-412d-bdc9-df5db5adcf84', 'security_group_id': 'fe1f6c5c-4d49-4ccc-ac2e-20ef23041510'}]

neutron/neutron-server.log:78508:2022-12-12 16:39:15.309 691 ERROR
neutron.plugins.ml2.managers [... - default default] Mechanism driver
'ovn' failed in delete_port_postcommit:
ovsdbapp.exceptions.TimeoutException: Commands
[DelLSwitchPortCommand(lport=76ff3324-7326-412d-bdc9-df5db5adcf84...


Such ports are detected by maintenance task, but only reported as warnings in logs:

neutron/neutron-server.log:76862:2022-12-12 16:35:11.420 712 WARNING
neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.maintenance
[req-4a4b33c2-85b3-48c1-8b15-ed6d65db3c2d - - - - -] Skip fixing
resource 76ff3324-7326-412d-bdc9-df5db5adcf84 (type: ports). Resource
does not exist in Neutron database anymore:
neutron_lib.exceptions.PortNotFound: Port
76ff3324-7326-412d-bdc9-df5db5adcf84 could not be found.


When neutron tries to create a new port for nova instance in the same network and the IP address of the new port matches the IP of the orphaned virtual-parent, neutron binds the new port's virtual switchport to the orphan but fails to proceed with binding algorithms, resulting in a perpetually-DOWN port.

For example, here is OVN-side body of a new virtual port, that has
failed to bind to compute:

addresses           : ["fa:16:3e:44:8a:d5 10.0.0.29"]
enabled             : true
external_ids        : {"neutron:cidrs"="10.0.0.29/24", "neutron:device_id"="2098a135-d6a6-4221-a8e9-2584c170dade", "neutron:device_owner"="compute:nova", "neutron:network_name"=neutron-def3de91-2120-47b5-b9f1-6ed51cf0e604, "neutron:port_name"="", "neutron:project_id"="867ba703d19947629e01d800ecdc01c0", "neutron:revision_number"="3", "neutron:security_group_ids"="2ecda920-36a2-44ff-96fa-a652d1cbd6c1 fe1f6c5c-4d49-4ccc-ac2e-20ef23041510"}
name                : "79cba8eb-dd1a-455a-873c-0e04f398c8d0"
options             : {mcast_flood_reports="true", requested-chassis=cmpt-av-02, virtual-ip="10.0.0.29", virtual-parents="76ff3324-7326-412d-bdc9-df5db5adcf84"}
port_security       : ["fa:16:3e:44:8a:d5 10.0.0.29"]
type                : virtual
up                  : false


it was incorrectly bound to orphaned parent 76ff:


addresses           : ["fa:16:3e:f6:cc:6a 10.0.0.29"]
enabled             : true
external_ids        : {"neutron:cidrs"="10.0.0.29/24", "neutron:device_id"="91e19b3e-1412-4519-b499-06ae794ee0a3", "neutron:device_owner"="", "neutron:network_name"=neutron-def3de91-2120-47b5-b9f1-6ed51cf0e604, "neutron:port_name"="", "neutron:project_id"="867ba703d19947629e01d800ecdc01c0", "neutron:revision_number"="1", "neutron:security_group_ids"="fe1f6c5c-4d49-4ccc-ac2e-20ef23041510"}
name                : "76ff3324-7326-412d-bdc9-df5db5adcf84"
options             : {mcast_flood_reports="true", requested-chassis=""}
port_security       : ["fa:16:3e:f6:cc:6a 10.0.0.29"]
type                : ""
up                  : false


As we can see, the only set of matching values is (IP, network_id) triplet, which may indicate that the problem lies in the usage of 

def get_virtual_port_parents(self, virtual_ip, port):

function in
neutron\plugins\ml2\drivers\ovn\mech_driver\ovsdb\ovn_client.py:303


Manual workaround:
manually delete the port from OVN NB (ovn-nbctl lsp-del), and it's version from neutron ovn_revision_numbers table.

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2000378

Title:
  [OVN] orphaned virtual parent ports break new ports

Status in neutron:
  New

Bug description:
  Reproducible on stable/yoga.

  
  Should the ovn port deletion fail due to backend (mariadb or ovn) connection failure, leftover switchports are left hanging in the OVN NB db.

  oslo_db.exception.DBDeadlock: (pymysql.err.OperationalError) (1205, 'Lock wait timeout exceeded; try restarting transaction')
  [SQL: DELETE FROM securitygroupportbindings WHERE securitygroupportbindings.port_id = %(port_id)s AND securitygroupportbindings.security_group_id = %(security_group_id)s]
  [parameters: {'port_id': '76ff3324-7326-412d-bdc9-df5db5adcf84', 'security_group_id': 'fe1f6c5c-4d49-4ccc-ac2e-20ef23041510'}]

  neutron/neutron-server.log:78508:2022-12-12 16:39:15.309 691 ERROR
  neutron.plugins.ml2.managers [... - default default] Mechanism driver
  'ovn' failed in delete_port_postcommit:
  ovsdbapp.exceptions.TimeoutException: Commands
  [DelLSwitchPortCommand(lport=76ff3324-7326-412d-bdc9-df5db5adcf84...

  
  Such ports are detected by maintenance task, but only reported as warnings in logs:

  neutron/neutron-server.log:76862:2022-12-12 16:35:11.420 712 WARNING
  neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.maintenance
  [req-4a4b33c2-85b3-48c1-8b15-ed6d65db3c2d - - - - -] Skip fixing
  resource 76ff3324-7326-412d-bdc9-df5db5adcf84 (type: ports). Resource
  does not exist in Neutron database anymore:
  neutron_lib.exceptions.PortNotFound: Port
  76ff3324-7326-412d-bdc9-df5db5adcf84 could not be found.

  
  When neutron tries to create a new port for nova instance in the same network and the IP address of the new port matches the IP of the orphaned virtual-parent, neutron binds the new port's virtual switchport to the orphan but fails to proceed with binding algorithms, resulting in a perpetually-DOWN port.

  For example, here is OVN-side body of a new virtual port, that has
  failed to bind to compute:

  addresses           : ["fa:16:3e:44:8a:d5 10.0.0.29"]
  enabled             : true
  external_ids        : {"neutron:cidrs"="10.0.0.29/24", "neutron:device_id"="2098a135-d6a6-4221-a8e9-2584c170dade", "neutron:device_owner"="compute:nova", "neutron:network_name"=neutron-def3de91-2120-47b5-b9f1-6ed51cf0e604, "neutron:port_name"="", "neutron:project_id"="867ba703d19947629e01d800ecdc01c0", "neutron:revision_number"="3", "neutron:security_group_ids"="2ecda920-36a2-44ff-96fa-a652d1cbd6c1 fe1f6c5c-4d49-4ccc-ac2e-20ef23041510"}
  name                : "79cba8eb-dd1a-455a-873c-0e04f398c8d0"
  options             : {mcast_flood_reports="true", requested-chassis=cmpt-av-02, virtual-ip="10.0.0.29", virtual-parents="76ff3324-7326-412d-bdc9-df5db5adcf84"}
  port_security       : ["fa:16:3e:44:8a:d5 10.0.0.29"]
  type                : virtual
  up                  : false

  
  it was incorrectly bound to orphaned parent 76ff:

  
  addresses           : ["fa:16:3e:f6:cc:6a 10.0.0.29"]
  enabled             : true
  external_ids        : {"neutron:cidrs"="10.0.0.29/24", "neutron:device_id"="91e19b3e-1412-4519-b499-06ae794ee0a3", "neutron:device_owner"="", "neutron:network_name"=neutron-def3de91-2120-47b5-b9f1-6ed51cf0e604, "neutron:port_name"="", "neutron:project_id"="867ba703d19947629e01d800ecdc01c0", "neutron:revision_number"="1", "neutron:security_group_ids"="fe1f6c5c-4d49-4ccc-ac2e-20ef23041510"}
  name                : "76ff3324-7326-412d-bdc9-df5db5adcf84"
  options             : {mcast_flood_reports="true", requested-chassis=""}
  port_security       : ["fa:16:3e:f6:cc:6a 10.0.0.29"]
  type                : ""
  up                  : false

  
  As we can see, the only set of matching values is (IP, network_id) triplet, which may indicate that the problem lies in the usage of 

  def get_virtual_port_parents(self, virtual_ip, port):

  function in
  neutron\plugins\ml2\drivers\ovn\mech_driver\ovsdb\ovn_client.py:303

  
  Manual workaround:
  manually delete the port from OVN NB (ovn-nbctl lsp-del), and it's version from neutron ovn_revision_numbers table.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2000378/+subscriptions