yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #78020
[Bug 1824856] [NEW] [RFE] Add new state "ERROR" for HA routers
Public bug reported:
Bug originally found by Jakub Libosvar and Assaf Muller:
In the case where a router replica transitions from standby to active
(but also in other cases), it might happen that the keepalived-state-
change-monitor encounters an error (for example in this case as a result
of a permissions issue in /var/lib/neutron), but generally speaking
under any error condition, we thought that keepalived-state-change-
monitor should update the L3 agent that an error has occurred. Then the
L3 agent would put that router replica in 'ERROR' state and update
neutron-server, which would update the DB and API responses. This would
allow the operator to know that an error happened for that particular
router replica and that they should investigate. Bonus points if we also
have keepalived-state-change-monitor send the actual error message to
the agent. We'd then update the RPC format between the agent and the
server and add a DB field like 'error_message' which we could display to
the operator.
I'm proposing it as RFE because it would add new state of router on L3 agent and that is user visible change.
** Affects: neutron
Importance: Low
Status: Confirmed
** Tags: l3-dvr-backlog rfe
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1824856
Title:
[RFE] Add new state "ERROR" for HA routers
Status in neutron:
Confirmed
Bug description:
Bug originally found by Jakub Libosvar and Assaf Muller:
In the case where a router replica transitions from standby to active
(but also in other cases), it might happen that the keepalived-state-
change-monitor encounters an error (for example in this case as a
result of a permissions issue in /var/lib/neutron), but generally
speaking under any error condition, we thought that keepalived-state-
change-monitor should update the L3 agent that an error has occurred.
Then the L3 agent would put that router replica in 'ERROR' state and
update neutron-server, which would update the DB and API responses.
This would allow the operator to know that an error happened for that
particular router replica and that they should investigate. Bonus
points if we also have keepalived-state-change-monitor send the actual
error message to the agent. We'd then update the RPC format between
the agent and the server and add a DB field like 'error_message' which
we could display to the operator.
I'm proposing it as RFE because it would add new state of router on L3 agent and that is user visible change.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1824856/+subscriptions