yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1988281] Re: neutron dhcp agent state not consistent with real status

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Brian Haley <1988281@xxxxxxxxxxxxxxxxxx>
Date: Wed, 31 Aug 2022 19:14:18 -0000
Reply-to: Bug 1988281 <1988281@xxxxxxxxxxxxxxxxxx>
Sender: noreply@xxxxxxxxxxxxx

I would agree with Rodolfo that this is more of an RFE as there isn't
any fine-grained status info, in this case UP indicates the agent is
running.

As an FYI the agent is consuming messages off the queue as it's doing a
full-sync, and it should also be receiving other messages as instances
are created/destroyed. Also, these "new" messages have a priority value
such that they should be processed sooner than some of the full-sync
ones, based on the code and comments on the notifier code.

neutron/api/rpc/agentnotifiers/dhcp_rpc_agent_api.py

# In order to improve port dhcp provisioning when nova concurrently create
# multiple vms, I classify the port_create_end message to two levels, the
# high-level message only cast to one agent, the low-level message cast to all
# other agent. In this way, When there are a large number of ports that need to
# be processed, we can dispatch the high priority message of port to different
# agent, so that the processed port will not block other port's processing in
# other dhcp agents.

It can take a long time for any agent to complete a full-sync operation
on a restart, but we have tried to speed it up as best we can and
there's probably always room for improvement. The other option is go to
an OVN backend, which removes these agents completely...

** Changed in: neutron
   Importance: Undecided => Wishlist

** Changed in: neutron
       Status: New => Opinion

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1988281

Title:
  neutron  dhcp agent state not consistent with real status

Status in neutron:
  Opinion

Bug description:
  We are observing that neutron-dhcp-agent's state is deviating from
  "real state", by saying real state, I mean all hosted dnsmasq are
  running and configured.

  For example, agent A is hosting 1,000 networks, if I reboot agent A
  then all dnsmasq processes are gone, and dhcp agent will try to reboot
  every dnsmasq, this will introduce a long delay between agent start
  and agent handles new rabbitmq messages. But weirdly, openstack
  network agent list will show that the agent is up and running which
  IMO is inconsistent. I think under this situation, openstack network
  agent list should report the corresponding agent to be down.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1988281/+subscriptions

References

[Bug 1988281] [NEW] neutron dhcp agent state not consistent with real status
From: norman shen, 2022-08-31