yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #86829
[Bug 1939144] [NEW] [OVN] Router Availability Zones doesn't work with segmented networks
Public bug reported:
Hi,
Looking at the external networks from the edge environment I see that
these fields are None:
| provider:network_type | None |
| provider:physical_network | None |
Instead we have this:
| segments | [{'provider:network_type': 'flat',
'provider:physical_network': 'leaf0', 'provider:segmentation_id':
None}, {'provider:network_type': 'flat', 'provider:physical_network':
'leaf1', 'provider:segmentation_id': None}, {'provider:network_type':
'flat', 'provider:physical_network': 'leaf2',
'provider:segmentation_id': None}] |
When building a list of candidates nodes to scheduler the gateway
router ports to, the ML2/OVN driver tries to check if there's a
physical network on the nodes, see [0][1]. And in order to do that it
uses the "provider:network_type" and "provider:physical_network"
fields (see [1]).
So the physnet attribute is now None (see [0]) and when it gets to the
get_candidates_for_scheduling() method [2] the list of candidates will
be empty because no gateway node matched this physnet. Also it is in
this method that we filter the candidates based on the AZs.
Now, the reason why it does not fail and the gw port still get
scheduled to any other gw node is because once it gets to the
scheduler code if the list candidates is empty it will then just fetch
a list of gw chassis without any consideration [3] regarding the
physnets and use it as candidates.
As you can see the code is messy and a future refactor may be needed.
For this problem specifically I would recommend doing a simpler fix where
get_candidates_for_scheduling() would consider all GW chassis independent
of the physnet in case it's None and then filter these Chassis based on
their AZ. That would be a simpler fix that is backportable.
[0] https://github.com/openstack/neutron/blob/b7befc98118c270877b42e94f9cb6f7ccad0b072/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L1370
[1] https://github.com/openstack/neutron/blob/b7befc98118c270877b42e94f9cb6f7ccad0b072/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L1314-L1317
[2] https://github.com/openstack/neutron/blob/b7befc98118c270877b42e94f9cb6f7ccad0b072/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L1291-L1296
[3] https://github.com/openstack/neutron/blob/b7befc98118c270877b42e94f9cb6f7ccad0b072/neutron/scheduler/l3_ovn_scheduler.py#L62
** Affects: neutron
Importance: High
Assignee: Lucas Alvares Gomes (lucasagomes)
Status: Confirmed
** Tags: ovn
** Changed in: neutron
Status: New => Confirmed
** Changed in: neutron
Importance: Undecided => High
** Changed in: neutron
Assignee: (unassigned) => Lucas Alvares Gomes (lucasagomes)
** Tags added: ovn
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1939144
Title:
[OVN] Router Availability Zones doesn't work with segmented networks
Status in neutron:
Confirmed
Bug description:
Hi,
Looking at the external networks from the edge environment I see that
these fields are None:
| provider:network_type | None |
| provider:physical_network | None |
Instead we have this:
| segments | [{'provider:network_type': 'flat',
'provider:physical_network': 'leaf0', 'provider:segmentation_id':
None}, {'provider:network_type': 'flat', 'provider:physical_network':
'leaf1', 'provider:segmentation_id': None}, {'provider:network_type':
'flat', 'provider:physical_network': 'leaf2',
'provider:segmentation_id': None}] |
When building a list of candidates nodes to scheduler the gateway
router ports to, the ML2/OVN driver tries to check if there's a
physical network on the nodes, see [0][1]. And in order to do that it
uses the "provider:network_type" and "provider:physical_network"
fields (see [1]).
So the physnet attribute is now None (see [0]) and when it gets to the
get_candidates_for_scheduling() method [2] the list of candidates will
be empty because no gateway node matched this physnet. Also it is in
this method that we filter the candidates based on the AZs.
Now, the reason why it does not fail and the gw port still get
scheduled to any other gw node is because once it gets to the
scheduler code if the list candidates is empty it will then just fetch
a list of gw chassis without any consideration [3] regarding the
physnets and use it as candidates.
As you can see the code is messy and a future refactor may be needed.
For this problem specifically I would recommend doing a simpler fix where
get_candidates_for_scheduling() would consider all GW chassis independent
of the physnet in case it's None and then filter these Chassis based on
their AZ. That would be a simpler fix that is backportable.
[0] https://github.com/openstack/neutron/blob/b7befc98118c270877b42e94f9cb6f7ccad0b072/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L1370
[1] https://github.com/openstack/neutron/blob/b7befc98118c270877b42e94f9cb6f7ccad0b072/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L1314-L1317
[2] https://github.com/openstack/neutron/blob/b7befc98118c270877b42e94f9cb6f7ccad0b072/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L1291-L1296
[3] https://github.com/openstack/neutron/blob/b7befc98118c270877b42e94f9cb6f7ccad0b072/neutron/scheduler/l3_ovn_scheduler.py#L62
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1939144/+subscriptions