← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1766812] [NEW] the machine running dhcp agent will have very high cpu load when start dhcp agent after the agent down more than 150 seconds

 

Public bug reported:

This issue can be reproduced by following steps:

openstack Ocata version, centos 7.2

1. two dhcp agent nodes
2. neutron-server side config allow_automatic_dhcp_failover is True and dhcp_agents_per_network is 2
3. create a lot of networks and each one have one subnet, I created 200.The more networks, the higher cpu load of dhcp agent node, and the longer high cpu load duration
4. stop one dhcp agent, and wait at least more than 150s (agent_down_time * 2). It is best to check the distribution of networks on two dhcp agent nodes. Neutron-server will remove the networks of the dead dhcp agent after 150s, it is better to wait until all the networks is removed from the dead dhcp agent in the db. So if have 200 networks, you can do the next step after more than 5 minites.
5. start the dhcp agent above, and use top to check the cpu situation, after a while, you will see very high cpu load.

If you have rabbitmq web UI, after do the 5 step, the dhcp agent will
sync the networks and the dhcp agent consumer has not been created yet.
Neutron-server find that the dhcp agent is active and re schedule
network to the dhcp agent, you will find that the messages heap up in
the dhcp agent side. After the dhcp agent finished syncing networks, the
dhcp agent consumer is created and will consume the messages but not
deal. When the dhcp agent queue consumes the heap messages and deal, the
cpu load of dhcp agent node will become higher and higher.

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1766812

Title:
  the machine running dhcp agent will have very high cpu load when start
  dhcp agent after the agent down more than 150 seconds

Status in neutron:
  New

Bug description:
  This issue can be reproduced by following steps:

  openstack Ocata version, centos 7.2

  1. two dhcp agent nodes
  2. neutron-server side config allow_automatic_dhcp_failover is True and dhcp_agents_per_network is 2
  3. create a lot of networks and each one have one subnet, I created 200.The more networks, the higher cpu load of dhcp agent node, and the longer high cpu load duration
  4. stop one dhcp agent, and wait at least more than 150s (agent_down_time * 2). It is best to check the distribution of networks on two dhcp agent nodes. Neutron-server will remove the networks of the dead dhcp agent after 150s, it is better to wait until all the networks is removed from the dead dhcp agent in the db. So if have 200 networks, you can do the next step after more than 5 minites.
  5. start the dhcp agent above, and use top to check the cpu situation, after a while, you will see very high cpu load.

  If you have rabbitmq web UI, after do the 5 step, the dhcp agent will
  sync the networks and the dhcp agent consumer has not been created
  yet. Neutron-server find that the dhcp agent is active and re schedule
  network to the dhcp agent, you will find that the messages heap up in
  the dhcp agent side. After the dhcp agent finished syncing networks,
  the dhcp agent consumer is created and will consume the messages but
  not deal. When the dhcp agent queue consumes the heap messages and
  deal, the cpu load of dhcp agent node will become higher and higher.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1766812/+subscriptions


Follow ups