← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1861509] [NEW] [OVN] GW rescheduling logic is broken

 

Public bug reported:

When a Chassis event happens in the SB database, we attempt to
reschedule any possible unhosted gateways [0] *always* due to a problem
with the existing logic:


    def get_unhosted_gateways(self, port_physnet_dict, chassis_physnets,
                              gw_chassis):
        unhosted_gateways = []
        for lrp in self._tables['Logical_Router_Port'].rows.values():
            if not lrp.name.startswith('lrp-'):
                continue
            physnet = port_physnet_dict.get(lrp.name[len('lrp-'):])
            chassis_list = self._get_logical_router_port_gateway_chassis(lrp)
            is_max_gw_reached = len(chassis_list) < ovn_const.MAX_GW_CHASSIS
            for chassis_name, prio in chassis_list:
                # TODO(azbiswas): Handle the case when a chassis is no
                # longer valid. This may involve moving conntrack states,
                # so it needs to discussed in the OVN community first.
                if is_max_gw_reached or utils.is_gateway_chassis_invalid(
                        chassis_name, gw_chassis, physnet, chassis_physnets):
                    unhosted_gateways.append(lrp.name)
        return unhosted_gateways


1) is_max_gw_reached is always going to be True (as normally the possible candidates are less than the maximum) 

2) unhosted_gateways.append(lrp.name) is executed inside a loop where
lrp doesn't change meaning that if there's 3 candidates in the
chassis_list,  lrp.name is added 3 times to the list!!!

3) Later on, in the caller, we're iterating over the returned list [1]
so as it has all the LRPs N times (N being the names of gw chassis), it
will do a lot of extra and unnecessary work.


This is almost harmless in the sense that it's not breaking any functionality but it creates unnecessary updates on the logical router port:


2020-01-31 15:54:04.669 37 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=0): UpdateLRouterPortCommand(name=lrp-93b49ece-2dbc-4fcc-84cb-e7afd482a12e, columns={'gateway_chassis': ['0444b1f1-e9a9-4a73-ba78-997c87e61795', '43d98571-ccd6-48ce-bf4f-08f24aeed522', 'fe8f9887-27ef-4724-8cfc-50ec6e3d4a98']}, if_exists=True) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84
2020-01-31 15:54:04.670 37 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Transaction caused no change do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:121


[0] https://github.com/openstack/neutron/blob/4689564fa29915b042547bdeb3dcb44bca54e20c/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/impl_idl_ovn.py#L449
[1] https://github.com/openstack/neutron/blob/858d7f33950a80c73501377a4b2cd36b915d0f40/neutron/services/ovn_l3/plugin.py#L324

** Affects: neutron
     Importance: Undecided
     Assignee: Maciej Jozefczyk (maciej.jozefczyk)
         Status: Confirmed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1861509

Title:
  [OVN] GW rescheduling logic is broken

Status in neutron:
  Confirmed

Bug description:
  When a Chassis event happens in the SB database, we attempt to
  reschedule any possible unhosted gateways [0] *always* due to a
  problem with the existing logic:

  
      def get_unhosted_gateways(self, port_physnet_dict, chassis_physnets,
                                gw_chassis):
          unhosted_gateways = []
          for lrp in self._tables['Logical_Router_Port'].rows.values():
              if not lrp.name.startswith('lrp-'):
                  continue
              physnet = port_physnet_dict.get(lrp.name[len('lrp-'):])
              chassis_list = self._get_logical_router_port_gateway_chassis(lrp)
              is_max_gw_reached = len(chassis_list) < ovn_const.MAX_GW_CHASSIS
              for chassis_name, prio in chassis_list:
                  # TODO(azbiswas): Handle the case when a chassis is no
                  # longer valid. This may involve moving conntrack states,
                  # so it needs to discussed in the OVN community first.
                  if is_max_gw_reached or utils.is_gateway_chassis_invalid(
                          chassis_name, gw_chassis, physnet, chassis_physnets):
                      unhosted_gateways.append(lrp.name)
          return unhosted_gateways

  
  1) is_max_gw_reached is always going to be True (as normally the possible candidates are less than the maximum) 

  2) unhosted_gateways.append(lrp.name) is executed inside a loop where
  lrp doesn't change meaning that if there's 3 candidates in the
  chassis_list,  lrp.name is added 3 times to the list!!!

  3) Later on, in the caller, we're iterating over the returned list [1]
  so as it has all the LRPs N times (N being the names of gw chassis),
  it will do a lot of extra and unnecessary work.

  
  This is almost harmless in the sense that it's not breaking any functionality but it creates unnecessary updates on the logical router port:

  
  2020-01-31 15:54:04.669 37 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=0): UpdateLRouterPortCommand(name=lrp-93b49ece-2dbc-4fcc-84cb-e7afd482a12e, columns={'gateway_chassis': ['0444b1f1-e9a9-4a73-ba78-997c87e61795', '43d98571-ccd6-48ce-bf4f-08f24aeed522', 'fe8f9887-27ef-4724-8cfc-50ec6e3d4a98']}, if_exists=True) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84
  2020-01-31 15:54:04.670 37 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Transaction caused no change do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:121


  [0] https://github.com/openstack/neutron/blob/4689564fa29915b042547bdeb3dcb44bca54e20c/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/impl_idl_ovn.py#L449
  [1] https://github.com/openstack/neutron/blob/858d7f33950a80c73501377a4b2cd36b915d0f40/neutron/services/ovn_l3/plugin.py#L324

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1861509/+subscriptions


Follow ups