← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2030741] Re: [OVN] Lack of AZs awareness in L3 port scheduler

 

Reviewed:  https://review.opendev.org/c/openstack/neutron/+/892604
Committed: https://opendev.org/openstack/neutron/commit/a29ea3724e1f6bb54b76d1b9915c13014272fdcd
Submitter: "Zuul (22348)"
Branch:    master

commit a29ea3724e1f6bb54b76d1b9915c13014272fdcd
Author: Yann Morice <yann.morice@xxxxxxxxx>
Date:   Thu Aug 24 15:56:48 2023 +0200

    [ovn] AZs distribution in L3 port scheduler
    
    Update l3 ovn schedulers (chance, leastloaded) to ensure that LRP gateways are distributed over chassis in the different eligible AZs.
    
    Previous version already ensure that LRP gateways were scheduled over chassis in eligible AZs. But, depending on the deployment characteristics, all these chassis could be in the same AZ. In some use-cases, it could be needed to have LRP gateways in different AZs to be resilient on failures.
    
    This patch re-order the list of eligible chassis to add a priority on selecting chassis in different AZs.
    
    This should provide a solution for users who need to have their router gateways scheduled on chassis from different AZs.
    
    Closes-Bug: #2030741
    Change-Id: I72973abbb8b0f9cc5848fd3b4f6463c38c6595f8


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2030741

Title:
  [OVN] Lack of AZs awareness in L3 port scheduler

Status in neutron:
  Fix Released

Bug description:
  The OVN L3 port scheduler assigns the router ports to gateway chassis.
  It retrieves the chassis list from nodes configured as gateway
  (external_ids:ovn-cms-options=enable-chassis-as-gw). This list could
  be filtered by availability zones. In this case, the scheduler will
  filter out chassis from invalid AZs (scheduler/l3_ovn_scheduler.py).

  As a result, we have a list of all eligible chassis for gateway ports,
  in all AZs where it could be scheduled.

  Then, both chance and leastloaded scheduler select 5 nodes from this
  list (hardcoded in common/ovn/constants.py:MAX_GW_CHASSIS = 5)
  regardless of AZs membership. Everything seems OK but when more than 5
  nodes are available in one of the AZs, the gateway for a router can be
  scheduled in *only* one unique AZ.

  In some use cases, where AZs are mapped to “failure domains”, this
  could be a problem. While in OVS l3_ha mode, router instances where
  placed by “neutron.scheduler.l3_agent_scheduler.AZ*Scheduler” taking
  care of AZs and so were their ports, this seems not to be feasible
  out-of-box - right now - using OVN.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2030741/+subscriptions