yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1766812] [NEW] the machine running dhcp agent will have very high cpu load when start dhcp agent after the agent down more than 150 seconds

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Jiaping LI <1766812@xxxxxxxxxxxxxxxxxx>
Date: Wed, 25 Apr 2018 07:12:29 -0000
Reply-to: Bug 1766812 <1766812@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Public bug reported:

This issue can be reproduced by following steps:

openstack Ocata version, centos 7.2

1. two dhcp agent nodes
2. neutron-server side config allow_automatic_dhcp_failover is True and dhcp_agents_per_network is 2
3. create a lot of networks and each one have one subnet, I created 200.The more networks, the higher cpu load of dhcp agent node, and the longer high cpu load duration
4. stop one dhcp agent, and wait at least more than 150s (agent_down_time * 2). It is best to check the distribution of networks on two dhcp agent nodes. Neutron-server will remove the networks of the dead dhcp agent after 150s, it is better to wait until all the networks is removed from the dead dhcp agent in the db. So if have 200 networks, you can do the next step after more than 5 minites.
5. start the dhcp agent above, and use top to check the cpu situation, after a while, you will see very high cpu load.

If you have rabbitmq web UI, after do the 5 step, the dhcp agent will
sync the networks and the dhcp agent consumer has not been created yet.
Neutron-server find that the dhcp agent is active and re schedule
network to the dhcp agent, you will find that the messages heap up in
the dhcp agent side. After the dhcp agent finished syncing networks, the
dhcp agent consumer is created and will consume the messages but not
deal. When the dhcp agent queue consumes the heap messages and deal, the
cpu load of dhcp agent node will become higher and higher.

** Affects: neutron
Importance: Undecided
Status: New

--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1766812

Title:
the machine running dhcp agent will have very high cpu load when start
dhcp agent after the agent down more than 150 seconds

Status in neutron:
New

Bug description:
This issue can be reproduced by following steps:

openstack Ocata version, centos 7.2

If you have rabbitmq web UI, after do the 5 step, the dhcp agent will
sync the networks and the dhcp agent consumer has not been created
yet. Neutron-server find that the dhcp agent is active and re schedule
network to the dhcp agent, you will find that the messages heap up in
the dhcp agent side. After the dhcp agent finished syncing networks,
the dhcp agent consumer is created and will consume the messages but
not deal. When the dhcp agent queue consumes the heap messages and
deal, the cpu load of dhcp agent node will become higher and higher.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1766812/+subscriptions

Follow ups

[Bug 1766812] Re: the machine running dhcp agent will have very high cpu load when start dhcp agent after the agent down more than 150 seconds
From: Brian Haley, 2023-01-23