← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1981113] [NEW] OVN metadata agent can be slow with large amount of subnets

 

Public bug reported:

OVN metadata agent can take very long time (observed ~40s) to add cidrs
under a metadata namespace tap interface when a network consist of many
subnets (observed ~1700 subnets). The long processing time can result in
ovn-metada-agent not having haproxy ready by the time the first VM
cloud-init requests for its metadata. Thus resulting in VM missing
metadata for proper operation.

Reproducing step:
- Create a network with thousands of subnets under this network
- Create a VM connected to the network from above. Make sure this is the first VM on the deployed compute node(hypervisor). Observe that VM's cloud-init request time out due to no response from 169.256.169.256/openstack
- Observe that ovn-metadata-agent logs is probably still executing or was executing this code [1]

Possible solutions:
1. (Long hanging fruit?) See if there is a way to improve execution time of `ip.add` call. Perhaps passing a list of cidrs instead of a single cidr at the time can improve performance?
2. (more involved) refactor the code such that ovn-metadata-agent only adds a single cidr which belongs to the VM being created. Instead of unconditionally adding all cidrs for the network when the first VM is created(current implementation) 

[1]
https://github.com/openstack/neutron/blob/41bf8054017c72815226d5df50fd321b30fcba13/neutron/agent/ovn/metadata/agent.py#L488-L495

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1981113

Title:
  OVN metadata agent can be slow with large amount of subnets

Status in neutron:
  New

Bug description:
  OVN metadata agent can take very long time (observed ~40s) to add
  cidrs under a metadata namespace tap interface when a network consist
  of many subnets (observed ~1700 subnets). The long processing time can
  result in ovn-metada-agent not having haproxy ready by the time the
  first VM cloud-init requests for its metadata. Thus resulting in VM
  missing metadata for proper operation.

  Reproducing step:
  - Create a network with thousands of subnets under this network
  - Create a VM connected to the network from above. Make sure this is the first VM on the deployed compute node(hypervisor). Observe that VM's cloud-init request time out due to no response from 169.256.169.256/openstack
  - Observe that ovn-metadata-agent logs is probably still executing or was executing this code [1]

  Possible solutions:
  1. (Long hanging fruit?) See if there is a way to improve execution time of `ip.add` call. Perhaps passing a list of cidrs instead of a single cidr at the time can improve performance?
  2. (more involved) refactor the code such that ovn-metadata-agent only adds a single cidr which belongs to the VM being created. Instead of unconditionally adding all cidrs for the network when the first VM is created(current implementation) 

  [1]
  https://github.com/openstack/neutron/blob/41bf8054017c72815226d5df50fd321b30fcba13/neutron/agent/ovn/metadata/agent.py#L488-L495

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1981113/+subscriptions



Follow ups