← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2011513] Re: Wallaby on Ubuntu 20.04, Neutron 18.6.0 neutron-dhcp-agent RPC unusually slow

 

** Also affects: neutron (Ubuntu)
   Importance: Undecided
       Status: New

** Also affects: neutron
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2011513

Title:
  Wallaby on Ubuntu 20.04, Neutron 18.6.0 neutron-dhcp-agent RPC
  unusually slow

Status in Ubuntu Cloud Archive:
  New
Status in neutron:
  New
Status in neutron package in Ubuntu:
  New

Bug description:
  Hi!

  We're running Openstack Wallaby on Ubuntu 20.04, 3 high-performance
  infra nodes with a RabbitMQ cluster. I updated Neutron components to
  version 18.6.0, which recently became available in the cloud
  repository (http://ubuntu-cloud.archive.canonical.com/ubuntu focal-
  updates/wallaby main). The exact package versions updated are as
  follows:

  Install: libunbound8:amd64 (1.9.4-2ubuntu1.4, automatic), openvswitch-common:amd64 (2.15.2-0ubuntu1~cloud0, automatic)
  Upgrade: neutron-common:amd64 (2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1), python3-werkzeug:amd64 (0.16.1+dfsg1-2, 0.16.1+dfsg1-2ubuntu0.1), neutron-dhcp-agent:amd64 (2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1), neutron-l3-agent:amd64 (2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1), python3-neutron:amd64 (2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1), neutron-server:amd64 (2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1), neutron-plugin-ml2:amd64 (2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1), neutron-metadata-agent:amd64 (2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1), neutron-linuxbridge-agent:amd64 (2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1)

  Installed Neutron packages:

  ii  neutron-common                        2:18.6.0-0ubuntu1~cloud1                             all          Neutron is a virtual network service for Openstack - common
  ii  neutron-dhcp-agent                    2:18.6.0-0ubuntu1~cloud1                             all          Neutron is a virtual network service for Openstack - DHCP agent
   Firewall-as-a-Service driver for OpenStack Neutron
  ii  neutron-l3-agent                      2:18.6.0-0ubuntu1~cloud1                             all          Neutron is a virtual network service for Openstack - l3 agent
  ii  neutron-linuxbridge-agent             2:18.6.0-0ubuntu1~cloud1                             all          Neutron is a virtual network service for Openstack - linuxbridge agent
  ii  neutron-metadata-agent                2:18.6.0-0ubuntu1~cloud1                             all          Neutron is a virtual network service for Openstack - metadata agent
  ii  neutron-plugin-ml2                    2:18.6.0-0ubuntu1~cloud1                             all          Neutron is a virtual network service for Openstack - ML2 plugin
  ii  neutron-server                        2:18.6.0-0ubuntu1~cloud1                             all          Neutron is a virtual network service for Openstack - server
  ii  python3-neutron                       2:18.6.0-0ubuntu1~cloud1                             all          Neutron is a virtual network service for Openstack - Python library
  ii  python3-neutron-lib                   2.10.1-0ubuntu1~cloud0                               all          Neutron shared routines and utilities - Python 3.x
  ii  python3-neutronclient                 1:7.2.1-0ubuntu1~cloud0                              all          client API library for Neutron - Python 3.x

  Normally this would be an easy update, but this time neutron-dhcp-
  agent doesn't work properly:

  2023-03-14 05:44:27.572 2534501 INFO neutron.agent.dhcp.agent [req-4a362701-cc1f-4b9d-87e6-045b6a388709 - - - - -] Synchronizing state complete
  2023-03-14 05:44:38.868 2534501 ERROR neutron_lib.rpc [req-cb1dc604-1372-44cd-bc06-09496ed5f68f - - - - -] Timeout in RPC method dhcp_ready_on_ports. Waiting for 55 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID bd97110b004e413cb2d6b05d9fb3b57c
  2023-03-14 05:44:38.871 2534501 WARNING neutron_lib.rpc [req-cb1dc604-1372-44cd-bc06-09496ed5f68f - - - - -] Increasing timeout for dhcp_ready_on_ports calls to 120 seconds. Restart the agent to restore it to the default value.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID bd97110b004e413cb2d6b05d9fb3b57c
  2023-03-14 05:45:34.244 2534501 ERROR neutron.agent.dhcp.agent [req-cb1dc604-1372-44cd-bc06-09496ed5f68f - - - - -] Timeout notifying server of ports ready. Retrying...: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID bd97110b004e413cb2d6b05d9fb3b57c
  2023-03-14 05:47:10.876 2534501 INFO oslo_messaging._drivers.amqpdriver [-] No calling threads waiting for msg_id : bd97110b004e413cb2d6b05d9fb3b57c
  2023-03-14 05:47:34.353 2534501 ERROR neutron_lib.rpc [req-607a9252-49b1-4043-aa0d-2457b78dc99e - - - - -] Timeout in RPC method dhcp_ready_on_ports. Waiting for 27 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID f254f735998243c4b0a58ce95c974534
  2023-03-14 05:47:34.354 2534501 WARNING neutron_lib.rpc [req-607a9252-49b1-4043-aa0d-2457b78dc99e - - - - -] Increasing timeout for dhcp_ready_on_ports calls to 240 seconds. Restart the agent to restore it to the default value.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID f254f735998243c4b0a58ce95c974534
  2023-03-14 05:47:46.681 2534501 INFO oslo_messaging._drivers.amqpdriver [-] No calling threads waiting for msg_id : f254f735998243c4b0a58ce95c974534
  2023-03-14 05:48:01.086 2534501 ERROR neutron.agent.dhcp.agent [req-607a9252-49b1-4043-aa0d-2457b78dc99e - - - - -] Timeout notifying server of ports ready. Retrying...: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID f254f735998243c4b0a58ce95c974534
  2023-03-14 05:49:45.035 2534501 INFO neutron.agent.dhcp.agent [req-5935a0d0-a981-463c-a4ea-23ccbb54c896 - - - - -] DHCP configuration for ports ... (A successful configuration here).

  While neutron-dhcp-agent is waiting, neutron-server log gets filled up
  with:

  neutron-server.log:2023-03-14 05:47:05.761 4171971 INFO neutron.plugins.ml2.plugin [req-cb1dc604-1372-44cd-bc06-09496ed5f68f - - - - -] Attempt 1 to provision port 18cddbb8-f3ed-4b49-9c6f-c0c67b4f7c76
  ...
  neutron-server.log:2023-03-14 05:47:10.727 4171971 INFO neutron.plugins.ml2.plugin [req-cb1dc604-1372-44cd-bc06-09496ed5f68f - - - - -] Attempt 10 to provision port 18cddbb8-f3ed-4b49-9c6f-c0c67b4f7c76

  This repeats for each port of each network neutron-dhcp-agent needs to
  configure.

  Each subsequent configuration for each network takes about 1-2
  minutes, depending on the network size. With earlier Neutron versions
  the whole process of configuring all networks would finish in under a
  minute, i.e. DHCP configuration per port (and network) is several
  orders of magnitude slower than it should be. Once neutron-dhcp-agent
  finishes synchronization, it seems to work without issues although
  there aren't that many changes in our cloud to tell whether it's fast
  or slow, individual port updates seem to happen quickly.

  All other services are working well, RabbitMQ cluster is working well,
  infra nodes are not overloaded and there are no apparent issues other
  than this one with Neutron, thus I am inclined to think that the issue
  is specific to version 18.6.0 of neutron-dhcp-agent or neutron-server.

  I would appreciate any advice!

  Best regards, 
  Zakhar

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/2011513/+subscriptions