← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2125573] Re: Octavia LB FIP reachability broken when VRRP failover happens

 

Hi Maximilian,

Yes, there is a reason the VIP address is in "admin state" down. Sorry
this will be a bit of a long response.

TLDR: This port is always an "unbound" port (i.e. not assigned to any
host) that essentially reserves the VIP IP address such that it can't be
issued to another project during outage or failure scenarios. This is
the only way in neutron to "reserve" an IP address for a project. This
IP is then later used as an allowed_address_pair on the other ports.

The Octavia Amphora need to have an IP "alias" available on the base VIP
neutron ports that can be used for VRRP IP address failovers. VRRP
allows the Amphora to autonomously failover between themselves very
quickly in the event of a failure. When VRRP moves an IP address between
the Amphora instances it issues gratuitous ARP packets to update the
switching infrastructure that the IP address is not active on a new
port.

In neutron, the way IP address "aliases" are implemented is by adding an
"Allowed Address Pair" to the base  port attached to the Amphora.

If you look at the ports created for a load balancer, you will see the
"octavia-lb-" port, which is disabled and not attached to a host. This
port reserves the IP address for the VIP that will float across the
amphora instances. Then, for each Amphora instance there will be a
"octavia-lb-vrrp" port (also known as the base port). If you do a
"openstack port show" on these ports, you will see the
"allowed_address_pairs" setting with the VIP address configured on each.

Historically, we could not set the VIP port into admin_state_up as
neutron would attempt to bind it on a node (networker or compute). If
it's bound to a networker node, it means the traffic for that VIP would
have to flow through that networker node, creating a bottleneck. It
would also cause problems with packet routing as MAC and IP address
would be put into the switching tables for things like DVR. This may
have changed in neutron and/or OVN as I know there has been a lot of
discussion around the proper handling of unbound ports.

So, I am going to add the neutron project to this bug as I think we need
someone more familiar with how neutron handles unbound ports to provide
feedback. It may be a bug in how neutron is handling these ports or it
might be that the behavior has changed in neutron and we need to change
that in Octavia.

** Also affects: neutron
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2125573

Title:
  Octavia LB FIP reachability broken when VRRP failover happens

Status in neutron:
  New
Status in octavia:
  New

Bug description:
  Hello,

  we're running OpenStack (2025.1) together with OVN 24.03.2 and ovn-
  bgp-agent.

  We recently discovered an issue that VRRP failover between amphoras in our Octavia setup is not working correctly.
  Once a VRRP failover is being performed a connected FIP is not reachable anymore while internal connectivity works fine.
  Failing over the whole LB or a single amphora (which means recreating the instances) works fine in all cases.

  While debugging, we realized that when VRRP failover is being performed the field neutron:host_id is removed from the OVN NB and SB DB.
  As we're using Neutron / OVN with enable_distributed_floating_ip = True, this field is required to allow ovn-bgp-agent to correctly locate the chassis where the FIP should be announced.

  Digging into Neutron code we quickly realized that this field is being
  removed upon an LSP update when the given port is disabled:

  
  ovn_client.py (update_lsp_host_info) L340 [1]

  is being called by

  mech_driver.py (set_port_status_down) L1378 [2]

  is being called by

  ovsdb_monitor.py (LogicalSwitchPortUpdateDownEvent) L541 [3]

  which checks the 'enabled' field in OVN NB DB

  
  Once we had enabled the LB VIP Port in Neutron FIP failover worked fine when VRRP failover is being performed.
  We also realized that this is the root cause for LP#2111254 as well.
  When the VIP port of a distributed FIP is disabled, the external_mac field is not being populated.

  We analyzed the Octavia code and found the following line which creates the VIP port with admin_state_up: false:
  allowed_address_pairs.py (allocate_vip) L582 [4]

  Is there a specific reason why admin_state_up is required to be false?

  BR
  Maximilian Sesterhenn

  [1] https://opendev.org/openstack/neutron/src/branch/stable/2025.1/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L340
  [2] https://opendev.org/openstack/neutron/src/branch/stable/2025.1/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L1378
  [3] https://opendev.org/openstack/neutron/src/branch/stable/2025.1/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py#L541
  [4] https://opendev.org/openstack/octavia/src/branch/stable/2025.1/octavia/network/drivers/neutron/allowed_address_pairs.py#L582

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2125573/+subscriptions