← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1795212] [NEW] [RFE] Prevent DHCP agent from processing stale RPC messages when restarting up

 

Public bug reported:

Network rescheduling would be triggered when neutron server is
discovering that agents are down. At the same time, some bare metal and
node management systems will reboot those same nodes at the same time.
When those two actions happen together, it will result in the server
sending RPC notifications to agents that just get rebooted which will
lead to stale RPC messages when the DHCP agents return to service. These
messages were sent to the agent before the node was rebooted but were
not processed by the agent because it was shutdown at the time.

The negative effects brought by this case would be:
when an agent has received a stale network create/end notification, it will be triggered to start servicing a network even though the server may have already had that network assigned to a different agent. Since the agent does not periodically audit the list of networks that it is servicing it could potentially continue servicing a network that was not assigned to it forever. Similarly, it is possible that a stale delete message is processed thus causing the agent to stop servicing a network that it was actually supposed to service.

** Affects: neutron
     Importance: Undecided
     Assignee: Kailun Qin (kailun.qin)
         Status: New

** Changed in: neutron
     Assignee: (unassigned) => Kailun Qin (kailun.qin)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1795212

Title:
  [RFE] Prevent DHCP agent from processing stale RPC messages when
  restarting up

Status in neutron:
  New

Bug description:
  Network rescheduling would be triggered when neutron server is
  discovering that agents are down. At the same time, some bare metal
  and node management systems will reboot those same nodes at the same
  time. When those two actions happen together, it will result in the
  server sending RPC notifications to agents that just get rebooted
  which will lead to stale RPC messages when the DHCP agents return to
  service. These messages were sent to the agent before the node was
  rebooted but were not processed by the agent because it was shutdown
  at the time.

  The negative effects brought by this case would be:
  when an agent has received a stale network create/end notification, it will be triggered to start servicing a network even though the server may have already had that network assigned to a different agent. Since the agent does not periodically audit the list of networks that it is servicing it could potentially continue servicing a network that was not assigned to it forever. Similarly, it is possible that a stale delete message is processed thus causing the agent to stop servicing a network that it was actually supposed to service.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1795212/+subscriptions