← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2080492] [NEW] [OVN] VIP port does not come up when its virtual-parents are trunk sub-ports

 

Public bug reported:

OpenStack 2024.1
Neutron 25.0.0.0b2.dev189 (grabbed from master branch)
OVN 24.03.3

TLDR: VIP port does not come up when its virtual-parents are trunk sub-
ports


Lets say I have two instances in an internal network which form some kind of high availability cluster using a VRRP-like protocol.
Both instances have each one port which has a fixed IP each and then they both have the same VIP configured in Allowed Address Pairs.
That's enough to provide reachability of this internal VIP.

Now we want to make this VIP reachable from the external world using a FIP.
So we created a router with an external gateway and an internal interface.
We create a dummy port in the internal network which has the VIP from the Allowed Address Pairs of the other ports in its Fixed IPs.
This port can then be used as the association of the FIP.

Looking at the OVN NB DB, specifically the Logical_Switch_Port table, there is an entry for each of these ports.
The VIP LSP is not associated to any instance directly, but either Neutron or OVN seem to be able to link this VIP LSP to the other two LSPs of the instances, maybe because the Fixed IP is in their Allowed Address Pairs. Under options, we get virtual-ip and virtual-parents entries.
Virtual-parents are the two LSPs that are directly connected to the instances.

We use DVR, so normally FIPs are exposed where the instance is running.
It seems to dynamically detect where the VIP is active at the moment and update the neutron:host_id accordingly.
Once any of the two LSPs connected directly to the instances is up, the VIP LSP comes up as well.

So far, so good, even the FIP for the VIP is exposed where the instance
is running that is active for the VIP.

Things break once the virtual-parent of a VIP LSP is not a LSP directly connected to an instance, instead its a trunk sub-port.
On these VIP LSP objects, there are still virtual-ip and virtual-parent entries, it even has neutron:host_id with the current active host, but the VIP LSP will still be down (up : false).

Yesterday, I saw that traffic was forwarded to the gateway chassis instead of exposed locally, today I can see that without further changes traffic is indeed exposed locally on the compute node.
However, the VIP LSP is still down.

We're using ovn-bgp-agent, and the FIP is only exposed when the LSP is actually up.
The combination of both the behavior of Neutron / OVN and ovn-bgp-agent is what breaks this scenario for us and therefore stops further development.
I think that even with the VIP LSP being down, without the need for the port being up from ovn-bgp-agent, communication would work.

Is this expected?
Shouldn't the VIP LSP come up like it does with regular LSPs even when used with a trunk sub-port?
I dont know enough about where this mechanism is being triggered, is that something in Neutron or in OVN code?

OVN output with virtual-parents directly attached to instances (VIP LSP up):
https://paste.openstack.org/show/bJALt4Bt9j628S8Ve9aH/

OVN output with virtual-parents are trunk sub-ports (VIP LSP down):
https://paste.openstack.org/show/bS4G0mIo4If9rPIoaPYU/

** Affects: neutron
     Importance: Undecided
         Status: New

** Affects: ovn-bgp-agent
     Importance: Undecided
         Status: New


** Tags: ovn

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2080492

Title:
  [OVN] VIP port does not come up when its virtual-parents are trunk
  sub-ports

Status in neutron:
  New
Status in ovn-bgp-agent:
  New

Bug description:
  OpenStack 2024.1
  Neutron 25.0.0.0b2.dev189 (grabbed from master branch)
  OVN 24.03.3

  TLDR: VIP port does not come up when its virtual-parents are trunk
  sub-ports

  
  Lets say I have two instances in an internal network which form some kind of high availability cluster using a VRRP-like protocol.
  Both instances have each one port which has a fixed IP each and then they both have the same VIP configured in Allowed Address Pairs.
  That's enough to provide reachability of this internal VIP.

  Now we want to make this VIP reachable from the external world using a FIP.
  So we created a router with an external gateway and an internal interface.
  We create a dummy port in the internal network which has the VIP from the Allowed Address Pairs of the other ports in its Fixed IPs.
  This port can then be used as the association of the FIP.

  Looking at the OVN NB DB, specifically the Logical_Switch_Port table, there is an entry for each of these ports.
  The VIP LSP is not associated to any instance directly, but either Neutron or OVN seem to be able to link this VIP LSP to the other two LSPs of the instances, maybe because the Fixed IP is in their Allowed Address Pairs. Under options, we get virtual-ip and virtual-parents entries.
  Virtual-parents are the two LSPs that are directly connected to the instances.

  We use DVR, so normally FIPs are exposed where the instance is running.
  It seems to dynamically detect where the VIP is active at the moment and update the neutron:host_id accordingly.
  Once any of the two LSPs connected directly to the instances is up, the VIP LSP comes up as well.

  So far, so good, even the FIP for the VIP is exposed where the
  instance is running that is active for the VIP.

  Things break once the virtual-parent of a VIP LSP is not a LSP directly connected to an instance, instead its a trunk sub-port.
  On these VIP LSP objects, there are still virtual-ip and virtual-parent entries, it even has neutron:host_id with the current active host, but the VIP LSP will still be down (up : false).

  Yesterday, I saw that traffic was forwarded to the gateway chassis instead of exposed locally, today I can see that without further changes traffic is indeed exposed locally on the compute node.
  However, the VIP LSP is still down.

  We're using ovn-bgp-agent, and the FIP is only exposed when the LSP is actually up.
  The combination of both the behavior of Neutron / OVN and ovn-bgp-agent is what breaks this scenario for us and therefore stops further development.
  I think that even with the VIP LSP being down, without the need for the port being up from ovn-bgp-agent, communication would work.

  Is this expected?
  Shouldn't the VIP LSP come up like it does with regular LSPs even when used with a trunk sub-port?
  I dont know enough about where this mechanism is being triggered, is that something in Neutron or in OVN code?

  OVN output with virtual-parents directly attached to instances (VIP LSP up):
  https://paste.openstack.org/show/bJALt4Bt9j628S8Ve9aH/

  OVN output with virtual-parents are trunk sub-ports (VIP LSP down):
  https://paste.openstack.org/show/bS4G0mIo4If9rPIoaPYU/

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2080492/+subscriptions



Follow ups