yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #39955
[Bug 1505166] [NEW] Resync OVS, L3, DHCP agents upon revival
Public bug reported:
In some cases on a loaded cloud when neutron is working over rabbitmq in clustered mode there could be a condition when one of the rabbitmq cluster member is stuck replicating queues.
During that period agents that connect via that instance can't communicate and send heartbeats.
Neutron-sever will reschedule resources from such agents in such case.
After that, when rabbitmq finishes sync, agents will "revive", but will
not do anything to cleanup resources which were rescheduled during their
"sleep".
As a result, there could be resources in failed or conflicting state (dhcp/router namespaces, ports with binding_failed).
They should be either deleted or syncronized with server state.
** Affects: neutron
Importance: Undecided
Assignee: Eugene Nikanorov (enikanorov)
Status: In Progress
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1505166
Title:
Resync OVS, L3, DHCP agents upon revival
Status in neutron:
In Progress
Bug description:
In some cases on a loaded cloud when neutron is working over rabbitmq in clustered mode there could be a condition when one of the rabbitmq cluster member is stuck replicating queues.
During that period agents that connect via that instance can't communicate and send heartbeats.
Neutron-sever will reschedule resources from such agents in such case.
After that, when rabbitmq finishes sync, agents will "revive", but
will not do anything to cleanup resources which were rescheduled
during their "sleep".
As a result, there could be resources in failed or conflicting state (dhcp/router namespaces, ports with binding_failed).
They should be either deleted or syncronized with server state.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1505166/+subscriptions
Follow ups