← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1824856] [NEW] [RFE] Add new state "ERROR" for HA routers

 

Public bug reported:

Bug originally found by Jakub Libosvar and Assaf Muller:

In the case where a router replica transitions from standby to active
(but also in other cases), it might happen that the keepalived-state-
change-monitor encounters an error (for example in this case as a result
of a permissions issue in /var/lib/neutron), but generally speaking
under any error condition, we thought that keepalived-state-change-
monitor should update the L3 agent that an error has occurred. Then the
L3 agent would put that router replica in 'ERROR' state and update
neutron-server, which would update the DB and API responses. This would
allow the operator to know that an error happened for that particular
router replica and that they should investigate. Bonus points if we also
have keepalived-state-change-monitor send the actual error message to
the agent. We'd then update the RPC format between the agent and the
server and add a DB field like 'error_message' which we could display to
the operator.


I'm proposing it as RFE because it would add new state of router on L3 agent and that is user visible change.

** Affects: neutron
     Importance: Low
         Status: Confirmed


** Tags: l3-dvr-backlog rfe

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1824856

Title:
  [RFE] Add new state "ERROR" for HA routers

Status in neutron:
  Confirmed

Bug description:
  Bug originally found by Jakub Libosvar and Assaf Muller:

  In the case where a router replica transitions from standby to active
  (but also in other cases), it might happen that the keepalived-state-
  change-monitor encounters an error (for example in this case as a
  result of a permissions issue in /var/lib/neutron), but generally
  speaking under any error condition, we thought that keepalived-state-
  change-monitor should update the L3 agent that an error has occurred.
  Then the L3 agent would put that router replica in 'ERROR' state and
  update neutron-server, which would update the DB and API responses.
  This would allow the operator to know that an error happened for that
  particular router replica and that they should investigate. Bonus
  points if we also have keepalived-state-change-monitor send the actual
  error message to the agent. We'd then update the RPC format between
  the agent and the server and add a DB field like 'error_message' which
  we could display to the operator.

  
  I'm proposing it as RFE because it would add new state of router on L3 agent and that is user visible change.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1824856/+subscriptions