yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #91517
[Bug 2011513] Re: Neutron 18.6.0 - Wallaby on Ubuntu 20.04, neutron-dhcp-agent RPC unusually slow
This bug was fixed in the package neutron - 2:18.6.0-0ubuntu1~cloud2
---------------
neutron (2:18.6.0-0ubuntu1~cloud2) focal-wallaby; urgency=medium
.
* d/p/port-provisioning-retry-only-for-vm-ports.patch: Fix
performance regression introduced in 18.6.0 (LP: #2011513).
** Changed in: cloud-archive/wallaby
Status: Fix Committed => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2011513
Title:
Neutron 18.6.0 - Wallaby on Ubuntu 20.04, neutron-dhcp-agent RPC
unusually slow
Status in Ubuntu Cloud Archive:
Fix Released
Status in Ubuntu Cloud Archive wallaby series:
Fix Released
Status in neutron:
Fix Committed
Status in neutron package in Ubuntu:
Fix Released
Bug description:
Hi!
We're running Openstack Wallaby on Ubuntu 20.04, 3 high-performance
infra nodes with a RabbitMQ cluster. I updated Neutron components to
version 18.6.0, which recently became available in the cloud
repository (http://ubuntu-cloud.archive.canonical.com/ubuntu focal-
updates/wallaby main). The exact package versions updated are as
follows:
Install: libunbound8:amd64 (1.9.4-2ubuntu1.4, automatic), openvswitch-common:amd64 (2.15.2-0ubuntu1~cloud0, automatic)
Upgrade: neutron-common:amd64 (2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1), python3-werkzeug:amd64 (0.16.1+dfsg1-2, 0.16.1+dfsg1-2ubuntu0.1), neutron-dhcp-agent:amd64 (2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1), neutron-l3-agent:amd64 (2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1), python3-neutron:amd64 (2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1), neutron-server:amd64 (2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1), neutron-plugin-ml2:amd64 (2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1), neutron-metadata-agent:amd64 (2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1), neutron-linuxbridge-agent:amd64 (2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1)
Installed Neutron packages:
ii neutron-common 2:18.6.0-0ubuntu1~cloud1 all Neutron is a virtual network service for Openstack - common
ii neutron-dhcp-agent 2:18.6.0-0ubuntu1~cloud1 all Neutron is a virtual network service for Openstack - DHCP agent
Firewall-as-a-Service driver for OpenStack Neutron
ii neutron-l3-agent 2:18.6.0-0ubuntu1~cloud1 all Neutron is a virtual network service for Openstack - l3 agent
ii neutron-linuxbridge-agent 2:18.6.0-0ubuntu1~cloud1 all Neutron is a virtual network service for Openstack - linuxbridge agent
ii neutron-metadata-agent 2:18.6.0-0ubuntu1~cloud1 all Neutron is a virtual network service for Openstack - metadata agent
ii neutron-plugin-ml2 2:18.6.0-0ubuntu1~cloud1 all Neutron is a virtual network service for Openstack - ML2 plugin
ii neutron-server 2:18.6.0-0ubuntu1~cloud1 all Neutron is a virtual network service for Openstack - server
ii python3-neutron 2:18.6.0-0ubuntu1~cloud1 all Neutron is a virtual network service for Openstack - Python library
ii python3-neutron-lib 2.10.1-0ubuntu1~cloud0 all Neutron shared routines and utilities - Python 3.x
ii python3-neutronclient 1:7.2.1-0ubuntu1~cloud0 all client API library for Neutron - Python 3.x
Normally this would be an easy update, but this time neutron-dhcp-
agent doesn't work properly:
2023-03-14 05:44:27.572 2534501 INFO neutron.agent.dhcp.agent [req-4a362701-cc1f-4b9d-87e6-045b6a388709 - - - - -] Synchronizing state complete
2023-03-14 05:44:38.868 2534501 ERROR neutron_lib.rpc [req-cb1dc604-1372-44cd-bc06-09496ed5f68f - - - - -] Timeout in RPC method dhcp_ready_on_ports. Waiting for 55 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID bd97110b004e413cb2d6b05d9fb3b57c
2023-03-14 05:44:38.871 2534501 WARNING neutron_lib.rpc [req-cb1dc604-1372-44cd-bc06-09496ed5f68f - - - - -] Increasing timeout for dhcp_ready_on_ports calls to 120 seconds. Restart the agent to restore it to the default value.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID bd97110b004e413cb2d6b05d9fb3b57c
2023-03-14 05:45:34.244 2534501 ERROR neutron.agent.dhcp.agent [req-cb1dc604-1372-44cd-bc06-09496ed5f68f - - - - -] Timeout notifying server of ports ready. Retrying...: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID bd97110b004e413cb2d6b05d9fb3b57c
2023-03-14 05:47:10.876 2534501 INFO oslo_messaging._drivers.amqpdriver [-] No calling threads waiting for msg_id : bd97110b004e413cb2d6b05d9fb3b57c
2023-03-14 05:47:34.353 2534501 ERROR neutron_lib.rpc [req-607a9252-49b1-4043-aa0d-2457b78dc99e - - - - -] Timeout in RPC method dhcp_ready_on_ports. Waiting for 27 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID f254f735998243c4b0a58ce95c974534
2023-03-14 05:47:34.354 2534501 WARNING neutron_lib.rpc [req-607a9252-49b1-4043-aa0d-2457b78dc99e - - - - -] Increasing timeout for dhcp_ready_on_ports calls to 240 seconds. Restart the agent to restore it to the default value.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID f254f735998243c4b0a58ce95c974534
2023-03-14 05:47:46.681 2534501 INFO oslo_messaging._drivers.amqpdriver [-] No calling threads waiting for msg_id : f254f735998243c4b0a58ce95c974534
2023-03-14 05:48:01.086 2534501 ERROR neutron.agent.dhcp.agent [req-607a9252-49b1-4043-aa0d-2457b78dc99e - - - - -] Timeout notifying server of ports ready. Retrying...: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID f254f735998243c4b0a58ce95c974534
2023-03-14 05:49:45.035 2534501 INFO neutron.agent.dhcp.agent [req-5935a0d0-a981-463c-a4ea-23ccbb54c896 - - - - -] DHCP configuration for ports ... (A successful configuration here).
While neutron-dhcp-agent is waiting, neutron-server log gets filled up
with:
neutron-server.log:2023-03-14 05:47:05.761 4171971 INFO neutron.plugins.ml2.plugin [req-cb1dc604-1372-44cd-bc06-09496ed5f68f - - - - -] Attempt 1 to provision port 18cddbb8-f3ed-4b49-9c6f-c0c67b4f7c76
...
neutron-server.log:2023-03-14 05:47:10.727 4171971 INFO neutron.plugins.ml2.plugin [req-cb1dc604-1372-44cd-bc06-09496ed5f68f - - - - -] Attempt 10 to provision port 18cddbb8-f3ed-4b49-9c6f-c0c67b4f7c76
This repeats for each port of each network neutron-dhcp-agent needs to
configure.
Each subsequent configuration for each network takes about 1-2
minutes, depending on the network size. With earlier Neutron versions
the whole process of configuring all networks would finish in under a
minute, i.e. DHCP configuration per port (and network) is several
orders of magnitude slower than it should be. Once neutron-dhcp-agent
finishes synchronization, it seems to work without issues although
there aren't that many changes in our cloud to tell whether it's fast
or slow, individual port updates seem to happen quickly.
All other services are working well, RabbitMQ cluster is working well,
infra nodes are not overloaded and there are no apparent issues other
than this one with Neutron, thus I am inclined to think that the issue
is specific to version 18.6.0 of neutron-dhcp-agent or neutron-server.
I would appreciate any advice!
Best regards,
Zakhar
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/2011513/+subscriptions