← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1939144] [NEW] [OVN] Router Availability Zones doesn't work with segmented networks

 

Public bug reported:

Hi,

Looking at the external networks from the edge environment I see that
these fields are None:
| provider:network_type     | None |
| provider:physical_network | None |

Instead we have this:

| segments                  | [{'provider:network_type': 'flat',
'provider:physical_network': 'leaf0', 'provider:segmentation_id':
None}, {'provider:network_type': 'flat', 'provider:physical_network':
'leaf1', 'provider:segmentation_id': None}, {'provider:network_type':
'flat', 'provider:physical_network': 'leaf2',
'provider:segmentation_id': None}] |

When building a list of candidates nodes to scheduler the gateway
router ports to, the ML2/OVN driver tries to check if there's a
physical network on the nodes, see [0][1]. And in order to do that it
uses the "provider:network_type" and "provider:physical_network"
fields (see [1]).

So the physnet attribute is now None (see [0]) and when it gets to the
get_candidates_for_scheduling() method [2] the list of candidates will
be empty because no gateway node matched this physnet. Also it is in
this method that we filter the candidates based on the AZs.

Now, the reason why it does not fail and the gw port still get
scheduled to any other gw node is because once it gets to the
scheduler code if the list candidates is empty it will then just fetch
a list of gw chassis without any consideration [3] regarding the
physnets and use it as candidates.

As you can see the code is messy and a future refactor may be needed.
For this problem specifically I would recommend doing a simpler fix where
get_candidates_for_scheduling() would consider all GW chassis independent
of the physnet in case it's None and then filter these Chassis based on
their AZ. That would be a simpler fix that is backportable.

[0] https://github.com/openstack/neutron/blob/b7befc98118c270877b42e94f9cb6f7ccad0b072/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L1370
[1] https://github.com/openstack/neutron/blob/b7befc98118c270877b42e94f9cb6f7ccad0b072/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L1314-L1317
[2] https://github.com/openstack/neutron/blob/b7befc98118c270877b42e94f9cb6f7ccad0b072/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L1291-L1296
[3] https://github.com/openstack/neutron/blob/b7befc98118c270877b42e94f9cb6f7ccad0b072/neutron/scheduler/l3_ovn_scheduler.py#L62

** Affects: neutron
     Importance: High
     Assignee: Lucas Alvares Gomes (lucasagomes)
         Status: Confirmed


** Tags: ovn

** Changed in: neutron
       Status: New => Confirmed

** Changed in: neutron
   Importance: Undecided => High

** Changed in: neutron
     Assignee: (unassigned) => Lucas Alvares Gomes (lucasagomes)

** Tags added: ovn

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1939144

Title:
  [OVN] Router Availability Zones doesn't work with segmented networks

Status in neutron:
  Confirmed

Bug description:
  Hi,

  Looking at the external networks from the edge environment I see that
  these fields are None:
  | provider:network_type     | None |
  | provider:physical_network | None |

  Instead we have this:

  | segments                  | [{'provider:network_type': 'flat',
  'provider:physical_network': 'leaf0', 'provider:segmentation_id':
  None}, {'provider:network_type': 'flat', 'provider:physical_network':
  'leaf1', 'provider:segmentation_id': None}, {'provider:network_type':
  'flat', 'provider:physical_network': 'leaf2',
  'provider:segmentation_id': None}] |

  When building a list of candidates nodes to scheduler the gateway
  router ports to, the ML2/OVN driver tries to check if there's a
  physical network on the nodes, see [0][1]. And in order to do that it
  uses the "provider:network_type" and "provider:physical_network"
  fields (see [1]).

  So the physnet attribute is now None (see [0]) and when it gets to the
  get_candidates_for_scheduling() method [2] the list of candidates will
  be empty because no gateway node matched this physnet. Also it is in
  this method that we filter the candidates based on the AZs.

  Now, the reason why it does not fail and the gw port still get
  scheduled to any other gw node is because once it gets to the
  scheduler code if the list candidates is empty it will then just fetch
  a list of gw chassis without any consideration [3] regarding the
  physnets and use it as candidates.

  As you can see the code is messy and a future refactor may be needed.
  For this problem specifically I would recommend doing a simpler fix where
  get_candidates_for_scheduling() would consider all GW chassis independent
  of the physnet in case it's None and then filter these Chassis based on
  their AZ. That would be a simpler fix that is backportable.

  [0] https://github.com/openstack/neutron/blob/b7befc98118c270877b42e94f9cb6f7ccad0b072/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L1370
  [1] https://github.com/openstack/neutron/blob/b7befc98118c270877b42e94f9cb6f7ccad0b072/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L1314-L1317
  [2] https://github.com/openstack/neutron/blob/b7befc98118c270877b42e94f9cb6f7ccad0b072/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L1291-L1296
  [3] https://github.com/openstack/neutron/blob/b7befc98118c270877b42e94f9cb6f7ccad0b072/neutron/scheduler/l3_ovn_scheduler.py#L62

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1939144/+subscriptions