← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1981113] Re: OVN metadata agent can be slow with large amount of subnets

 

Reviewed:  https://review.opendev.org/c/openstack/neutron/+/861124
Committed: https://opendev.org/openstack/neutron/commit/edf48e46a1f0227f84b05ab39da005393e5fa73f
Submitter: "Zuul (22348)"
Branch:    master

commit edf48e46a1f0227f84b05ab39da005393e5fa73f
Author: Miro Tomaska <mtomaska@xxxxxxxxxx>
Date:   Wed Oct 12 08:42:18 2022 -0500

    Improve agent provision performance for large networks
    
    Before this patch, the metadata agent would provision network namespace
    for all subnets under a network(datapath) as soon as the first
    VM(vif port) was mounted on the chassis. This operation can take very
    long time for networks with lots of subnets. See the linked bug for
    more details.
    This patch changes this mechanism to "lazy load" where metadata agent
    provisions metadata namespace with only the subnets belonging to the
    active ports on the chassis. This results in virtually constant
    throughput not effected by the number of subnets.
    
    Closes-Bug: #1981113
    Change-Id: Ia2a66cfd3fd1380c5204109742d44f09160548d2


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1981113

Title:
  OVN metadata agent can be slow with large amount of subnets

Status in neutron:
  Fix Released

Bug description:
  OVN metadata agent can take very long time (observed ~40s) to add
  cidrs under a metadata namespace tap interface when a network consist
  of many subnets (observed ~1700 subnets). The long processing time can
  result in ovn-metada-agent not having haproxy ready by the time the
  first VM cloud-init requests for its metadata. Thus resulting in VM
  missing metadata for proper operation.

  Reproducing step:
  - Create a network with hundreds or thousands of subnets under this network. The more subnets the more obvious the problem is
  - Create a VM connected to the network from above. Make sure this is the first VM on the deployed compute node(hypervisor). 
  - Once VM is created, observe that VM's cloud-init request time out due to no response from 169.256.169.256/openstack
  - Inspect ovn-metadata-agent log and notice this is due to ovn-metadata-agent taking very long time to process [1]

  Possible solutions:
  1. (Low hanging fruit?) See if there is a way to improve execution time of `ip.add` call. Perhaps passing a list of cidrs instead of a single cidr at the time can improve performance?
  2. (more involved) refactor the code such that ovn-metadata-agent only adds a single cidr which belongs to the VM being created. Instead of unconditionally adding all cidrs for the network when the first VM is created(current implementation)

  [1]
  https://github.com/openstack/neutron/blob/41bf8054017c72815226d5df50fd321b30fcba13/neutron/agent/ovn/metadata/agent.py#L488-L495

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1981113/+subscriptions



References