← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1875852] [NEW] [OVN] SRIOV routing on VLAN Tenant networks

 

Public bug reported:

Reported at: https://bugzilla.redhat.com/show_bug.cgi?id=1826364

<snipet>

Right now, the SRIOV support with ML2/OVN is limited to:


1) SRIOV ports on provider networks with external DHCP
2) SRIOV ports on provider networks with OVN DHCP and OVN Metadata service
3) SRIOV ports on VLAN tenant networks and E/W Neutron routing


This BZ is to track the implementation of a 4th scenario that covers:

4) SRIOV ports on VLAN tenant networks and N/S Neutron routing with and
without FIPs


There are two ways of achieving this (possibly more) but let me explain why it doesn't work right now.


SRIOV ports are mapped into OVN 'external' ports that are all scheduled into one controller (or network node). Example:


CH1: compute node where SRIOV VM1 (192.168.1.10 - FIP: 10.0.0.10) is running
CH2: chassis where OVN external port is bound to
CH3: chassis where gateway port is bound to
CH4: chassis on the provider network - external

PING from CH4 to VM1:
CH4 -> CH3 -> CH2 -> CH1
When an external node CH4 pings the FIP of the VM, the traffic will go to CH3 which will perform the NAT and route the traffic to CH1 which will send it to the SRIOV NIC at CH1.


As the ICMP request is delivered to the VM, the VM will try to resolve the router interface IP (e.g 192.168.1.1) and will send an ARP broadcast request on the VLAN tenant network.

Right now, this ARP packet will be unanswered because:

* There are flows to drop the ARP packet from the external port VM for the router IP on all chassis except the chassis claiming the external port, so ideally CH2 would reply. However,
* Router ports have the 'reside-on-redirect-chassis' that will make the VLAN traffic centralized [0], meaning that only the chassis hosting the gateway port (CH3 in our example) would reply to it.

In this context we have two possibilities to get this working:

1) Co-locating external and gateway ports. This is non trivial as it may
require moving things around that would cause dataplane disruption.

For example: when the external port is first created, it'll be scheduled on CH1 (no gateways involved yet). However, if the network that it belongs to is later attached to a router with a gateway, it may require moving the external port to achieve that co-location with the gateway port. Moving the external port can create disruption as DHCP/metadata will be unavailable for a certain window of time until everything settles.
This time window is unknown and clearly depends on factors such as how many ports need to be moved.

In this scenario, the packet flow in the example above would go this
way:

Echo request: CH4 -> CH3 (gateway & external port) -> CH1
Echo reply: CH1 -> CH3 (gateway & external port) -> CH4


2) Supporting distributed traffic on VLAN tenant networks: Tracked here [1]
In this case, there's no need to co-locate things as routing can happen automatically where the external port is bound. This eliminates the burden explained at 1).


Option number 2) seems the more reasonable and efficient way of achieving N/S routing for SRIOV ports on ML2/OVN. Hence I'm marking this bug as dependent on [1] and TestOnly for validation.


[0] https://opendev.org/openstack/networking-ovn/src/tag/7.1.0/networking_ovn/common/ovn_client.py#L1406
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1766930

</snipet>

** Affects: neutron
     Importance: Undecided
         Status: Confirmed


** Tags: ovn rfe

** Changed in: neutron
       Status: New => Confirmed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1875852

Title:
  [OVN] SRIOV routing on VLAN Tenant networks

Status in neutron:
  Confirmed

Bug description:
  Reported at: https://bugzilla.redhat.com/show_bug.cgi?id=1826364

  <snipet>

  Right now, the SRIOV support with ML2/OVN is limited to:

  
  1) SRIOV ports on provider networks with external DHCP
  2) SRIOV ports on provider networks with OVN DHCP and OVN Metadata service
  3) SRIOV ports on VLAN tenant networks and E/W Neutron routing

  
  This BZ is to track the implementation of a 4th scenario that covers:

  4) SRIOV ports on VLAN tenant networks and N/S Neutron routing with
  and without FIPs

  
  There are two ways of achieving this (possibly more) but let me explain why it doesn't work right now.

  
  SRIOV ports are mapped into OVN 'external' ports that are all scheduled into one controller (or network node). Example:

  
  CH1: compute node where SRIOV VM1 (192.168.1.10 - FIP: 10.0.0.10) is running
  CH2: chassis where OVN external port is bound to
  CH3: chassis where gateway port is bound to
  CH4: chassis on the provider network - external

  PING from CH4 to VM1:
  CH4 -> CH3 -> CH2 -> CH1
  When an external node CH4 pings the FIP of the VM, the traffic will go to CH3 which will perform the NAT and route the traffic to CH1 which will send it to the SRIOV NIC at CH1.

  
  As the ICMP request is delivered to the VM, the VM will try to resolve the router interface IP (e.g 192.168.1.1) and will send an ARP broadcast request on the VLAN tenant network.

  Right now, this ARP packet will be unanswered because:

  * There are flows to drop the ARP packet from the external port VM for the router IP on all chassis except the chassis claiming the external port, so ideally CH2 would reply. However,
  * Router ports have the 'reside-on-redirect-chassis' that will make the VLAN traffic centralized [0], meaning that only the chassis hosting the gateway port (CH3 in our example) would reply to it.

  In this context we have two possibilities to get this working:

  1) Co-locating external and gateway ports. This is non trivial as it
  may require moving things around that would cause dataplane
  disruption.

  For example: when the external port is first created, it'll be scheduled on CH1 (no gateways involved yet). However, if the network that it belongs to is later attached to a router with a gateway, it may require moving the external port to achieve that co-location with the gateway port. Moving the external port can create disruption as DHCP/metadata will be unavailable for a certain window of time until everything settles.
  This time window is unknown and clearly depends on factors such as how many ports need to be moved.

  In this scenario, the packet flow in the example above would go this
  way:

  Echo request: CH4 -> CH3 (gateway & external port) -> CH1
  Echo reply: CH1 -> CH3 (gateway & external port) -> CH4

  
  2) Supporting distributed traffic on VLAN tenant networks: Tracked here [1]
  In this case, there's no need to co-locate things as routing can happen automatically where the external port is bound. This eliminates the burden explained at 1).

  
  Option number 2) seems the more reasonable and efficient way of achieving N/S routing for SRIOV ports on ML2/OVN. Hence I'm marking this bug as dependent on [1] and TestOnly for validation.

  
  [0] https://opendev.org/openstack/networking-ovn/src/tag/7.1.0/networking_ovn/common/ovn_client.py#L1406
  [1] https://bugzilla.redhat.com/show_bug.cgi?id=1766930

  </snipet>

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1875852/+subscriptions