yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #96542
[Bug 2126061] [NEW] [ovs-agent] slow start with large number of ports
Public bug reported:
When the OVS agent restarts on nodes handling ~1000 VIFs, the
initialization step (treat_devices_added_or_updated) takes an
excessively long time. One possible bottleneck is the sequential
bulk_pull RPC calls for each port to populate the resource cache, which
could be optimized by batching these calls.
Steps to Reproduce:
Deploy an OpenStack environment with OVS agent managing ~1000 VIFs.
Restart the OVS agent.
The agent loops over all VIFs and calls get_device_details for each port individually.
The process is slow due to sequential RPC calls.
Logs:
2024-06-18 09:09:04.097 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 started
2024-06-18 09:09:05.145 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 - starting polling. Elapsed:1.048
2024-06-18 09:09:05.855 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 - port information retrieved. Elapsed:1.758
2024-06-18 09:47:52.094 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] process_network_ports - iteration:0 - treat_devices_added_or_updated completed. Skipped 0 and no activated binding devices 0 of 851 devices currently available. Time elapsed: 2326.236
2024-06-18 09:48:03.587 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] process_network_ports - iteration:0 - agent port security group processed in 2337.730
2024-06-18 09:55:22.481 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 - ports processed. Elapsed:2778.384
2024-06-18 09:55:31.954 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 - cleanup stale flows. Elapsed:2787.857
2024-06-18 09:55:31.955 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 completed. Processed ports statistics: {'regular': {'added': 851, 'updated': 0, 'removed': 0}}. Elapsed:2787.858
Proposed improvment:
Batch bulk_pull RPC calls for port details to reduce the number of round trips.
Optimize the resource cache population process for large-scale deployments.
** Affects: neutron
Importance: Undecided
Status: New
** Tags: ovs-agent
** Description changed:
When the OVS agent restarts on nodes handling ~1000 VIFs, the
initialization step (treat_devices_added_or_updated) takes an
excessively long time. One possible bottleneck is the sequential
bulk_pull RPC calls for each port to populate the resource cache, which
could be optimized by batching these calls.
Steps to Reproduce:
Deploy an OpenStack environment with OVS agent managing ~1000 VIFs.
Restart the OVS agent.
The agent loops over all VIFs and calls get_device_details for each port individually.
The process is slow due to sequential RPC calls.
Logs:
+ ```
2024-06-18 09:09:04.097 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 started
2024-06-18 09:09:05.145 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 - starting polling. Elapsed:1.048
2024-06-18 09:09:05.855 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 - port information retrieved. Elapsed:1.758
2024-06-18 09:47:52.094 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] process_network_ports - iteration:0 - treat_devices_added_or_updated completed. Skipped 0 and no activated binding devices 0 of 851 devices currently available. Time elapsed: 2326.236
2024-06-18 09:48:03.587 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] process_network_ports - iteration:0 - agent port security group processed in 2337.730
2024-06-18 09:55:22.481 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 - ports processed. Elapsed:2778.384
2024-06-18 09:55:31.954 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 - cleanup stale flows. Elapsed:2787.857
2024-06-18 09:55:31.955 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 completed. Processed ports statistics: {'regular': {'added': 851, 'updated': 0, 'removed': 0}}. Elapsed:2787.858
-
+ ```
Proposed improvment:
Batch bulk_pull RPC calls for port details to reduce the number of round trips.
Optimize the resource cache population process for large-scale deployments.
** Description changed:
When the OVS agent restarts on nodes handling ~1000 VIFs, the
initialization step (treat_devices_added_or_updated) takes an
excessively long time. One possible bottleneck is the sequential
bulk_pull RPC calls for each port to populate the resource cache, which
could be optimized by batching these calls.
Steps to Reproduce:
Deploy an OpenStack environment with OVS agent managing ~1000 VIFs.
Restart the OVS agent.
The agent loops over all VIFs and calls get_device_details for each port individually.
The process is slow due to sequential RPC calls.
Logs:
- ```
+
2024-06-18 09:09:04.097 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 started
2024-06-18 09:09:05.145 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 - starting polling. Elapsed:1.048
2024-06-18 09:09:05.855 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 - port information retrieved. Elapsed:1.758
2024-06-18 09:47:52.094 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] process_network_ports - iteration:0 - treat_devices_added_or_updated completed. Skipped 0 and no activated binding devices 0 of 851 devices currently available. Time elapsed: 2326.236
2024-06-18 09:48:03.587 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] process_network_ports - iteration:0 - agent port security group processed in 2337.730
2024-06-18 09:55:22.481 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 - ports processed. Elapsed:2778.384
2024-06-18 09:55:31.954 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 - cleanup stale flows. Elapsed:2787.857
2024-06-18 09:55:31.955 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 completed. Processed ports statistics: {'regular': {'added': 851, 'updated': 0, 'removed': 0}}. Elapsed:2787.858
- ```
+
Proposed improvment:
Batch bulk_pull RPC calls for port details to reduce the number of round trips.
Optimize the resource cache population process for large-scale deployments.
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2126061
Title:
[ovs-agent] slow start with large number of ports
Status in neutron:
New
Bug description:
When the OVS agent restarts on nodes handling ~1000 VIFs, the
initialization step (treat_devices_added_or_updated) takes an
excessively long time. One possible bottleneck is the sequential
bulk_pull RPC calls for each port to populate the resource cache,
which could be optimized by batching these calls.
Steps to Reproduce:
Deploy an OpenStack environment with OVS agent managing ~1000 VIFs.
Restart the OVS agent.
The agent loops over all VIFs and calls get_device_details for each port individually.
The process is slow due to sequential RPC calls.
Logs:
2024-06-18 09:09:04.097 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 started
2024-06-18 09:09:05.145 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 - starting polling. Elapsed:1.048
2024-06-18 09:09:05.855 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 - port information retrieved. Elapsed:1.758
2024-06-18 09:47:52.094 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] process_network_ports - iteration:0 - treat_devices_added_or_updated completed. Skipped 0 and no activated binding devices 0 of 851 devices currently available. Time elapsed: 2326.236
2024-06-18 09:48:03.587 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] process_network_ports - iteration:0 - agent port security group processed in 2337.730
2024-06-18 09:55:22.481 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 - ports processed. Elapsed:2778.384
2024-06-18 09:55:31.954 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 - cleanup stale flows. Elapsed:2787.857
2024-06-18 09:55:31.955 30704 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-c1f35b09-236a-4c60-9f98-426fd68676a9 - - - - -] Agent rpc_loop - iteration:0 completed. Processed ports statistics: {'regular': {'added': 851, 'updated': 0, 'removed': 0}}. Elapsed:2787.858
Proposed improvment:
Batch bulk_pull RPC calls for port details to reduce the number of round trips.
Optimize the resource cache population process for large-scale deployments.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2126061/+subscriptions