← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2126061] [NEW] [ovs-agent] slow start with large number of ports

 

Public bug reported:

When the OVS agent restarts on nodes handling ~1000 VIFs, the
initialization step (treat_devices_added_or_updated) takes an
excessively long time. One possible bottleneck is the sequential
bulk_pull RPC calls for each port to populate the resource cache, which
could be optimized by batching these calls.

Steps to Reproduce:

Deploy an OpenStack environment with OVS agent managing ~1000 VIFs.
Restart the OVS agent.
The agent loops over all VIFs and calls get_device_details for each port individually.
The process is slow due to sequential RPC calls.

Logs:


2024-06-18 09:09:04.097 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 started
2024-06-18 09:09:05.145 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 - starting polling. Elapsed:1.048
2024-06-18 09:09:05.855 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 - port information retrieved. Elapsed:1.758
2024-06-18 09:47:52.094 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] process_network_ports - iteration:0 - treat_devices_added_or_updated completed. Skipped 0 and no activated binding devices 0 of 851 devices currently available. Time elapsed: 2326.236
2024-06-18 09:48:03.587 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] process_network_ports - iteration:0 - agent port security group processed in 2337.730
2024-06-18 09:55:22.481 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 - ports processed. Elapsed:2778.384
2024-06-18 09:55:31.954 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 - cleanup stale flows. Elapsed:2787.857
2024-06-18 09:55:31.955 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 completed. Processed ports statistics: {'regular': {'added': 851, 'updated': 0, 'removed': 0}}. Elapsed:2787.858


Proposed improvment:

Batch bulk_pull RPC calls for port details to reduce the number of round trips.
Optimize the resource cache population process for large-scale deployments.

** Affects: neutron
     Importance: Undecided
         Status: New


** Tags: ovs-agent

** Description changed:

  When the OVS agent restarts on nodes handling ~1000 VIFs, the
  initialization step (treat_devices_added_or_updated) takes an
  excessively long time. One possible bottleneck is the sequential
  bulk_pull RPC calls for each port to populate the resource cache, which
  could be optimized by batching these calls.
  
  Steps to Reproduce:
  
  Deploy an OpenStack environment with OVS agent managing ~1000 VIFs.
  Restart the OVS agent.
  The agent loops over all VIFs and calls get_device_details for each port individually.
  The process is slow due to sequential RPC calls.
  
  Logs:
  
+ ```
  2024-06-18 09:09:04.097 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 started
  2024-06-18 09:09:05.145 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 - starting polling. Elapsed:1.048
  2024-06-18 09:09:05.855 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 - port information retrieved. Elapsed:1.758
  2024-06-18 09:47:52.094 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] process_network_ports - iteration:0 - treat_devices_added_or_updated completed. Skipped 0 and no activated binding devices 0 of 851 devices currently available. Time elapsed: 2326.236
  2024-06-18 09:48:03.587 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] process_network_ports - iteration:0 - agent port security group processed in 2337.730
  2024-06-18 09:55:22.481 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 - ports processed. Elapsed:2778.384
  2024-06-18 09:55:31.954 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 - cleanup stale flows. Elapsed:2787.857
  2024-06-18 09:55:31.955 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 completed. Processed ports statistics: {'regular': {'added': 851, 'updated': 0, 'removed': 0}}. Elapsed:2787.858
- 
+ ```
  
  Proposed improvment:
  
  Batch bulk_pull RPC calls for port details to reduce the number of round trips.
  Optimize the resource cache population process for large-scale deployments.

** Description changed:

  When the OVS agent restarts on nodes handling ~1000 VIFs, the
  initialization step (treat_devices_added_or_updated) takes an
  excessively long time. One possible bottleneck is the sequential
  bulk_pull RPC calls for each port to populate the resource cache, which
  could be optimized by batching these calls.
  
  Steps to Reproduce:
  
  Deploy an OpenStack environment with OVS agent managing ~1000 VIFs.
  Restart the OVS agent.
  The agent loops over all VIFs and calls get_device_details for each port individually.
  The process is slow due to sequential RPC calls.
  
  Logs:
  
- ```
+ 
  2024-06-18 09:09:04.097 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 started
  2024-06-18 09:09:05.145 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 - starting polling. Elapsed:1.048
  2024-06-18 09:09:05.855 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 - port information retrieved. Elapsed:1.758
  2024-06-18 09:47:52.094 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] process_network_ports - iteration:0 - treat_devices_added_or_updated completed. Skipped 0 and no activated binding devices 0 of 851 devices currently available. Time elapsed: 2326.236
  2024-06-18 09:48:03.587 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] process_network_ports - iteration:0 - agent port security group processed in 2337.730
  2024-06-18 09:55:22.481 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 - ports processed. Elapsed:2778.384
  2024-06-18 09:55:31.954 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 - cleanup stale flows. Elapsed:2787.857
  2024-06-18 09:55:31.955 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 completed. Processed ports statistics: {'regular': {'added': 851, 'updated': 0, 'removed': 0}}. Elapsed:2787.858
- ```
+ 
  
  Proposed improvment:
  
  Batch bulk_pull RPC calls for port details to reduce the number of round trips.
  Optimize the resource cache population process for large-scale deployments.

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2126061

Title:
  [ovs-agent] slow start with large number of ports

Status in neutron:
  New

Bug description:
  When the OVS agent restarts on nodes handling ~1000 VIFs, the
  initialization step (treat_devices_added_or_updated) takes an
  excessively long time. One possible bottleneck is the sequential
  bulk_pull RPC calls for each port to populate the resource cache,
  which could be optimized by batching these calls.

  Steps to Reproduce:

  Deploy an OpenStack environment with OVS agent managing ~1000 VIFs.
  Restart the OVS agent.
  The agent loops over all VIFs and calls get_device_details for each port individually.
  The process is slow due to sequential RPC calls.

  Logs:

  
  2024-06-18 09:09:04.097 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 started
  2024-06-18 09:09:05.145 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 - starting polling. Elapsed:1.048
  2024-06-18 09:09:05.855 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 - port information retrieved. Elapsed:1.758
  2024-06-18 09:47:52.094 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] process_network_ports - iteration:0 - treat_devices_added_or_updated completed. Skipped 0 and no activated binding devices 0 of 851 devices currently available. Time elapsed: 2326.236
  2024-06-18 09:48:03.587 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] process_network_ports - iteration:0 - agent port security group processed in 2337.730
  2024-06-18 09:55:22.481 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 - ports processed. Elapsed:2778.384
  2024-06-18 09:55:31.954 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 - cleanup stale flows. Elapsed:2787.857
  2024-06-18 09:55:31.955 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 completed. Processed ports statistics: {'regular': {'added': 851, 'updated': 0, 'removed': 0}}. Elapsed:2787.858

  
  Proposed improvment:

  Batch bulk_pull RPC calls for port details to reduce the number of round trips.
  Optimize the resource cache population process for large-scale deployments.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2126061/+subscriptions