← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1829449] [NEW] Implement consistency check and self-healing for SDN-managed fabrics

 

Public bug reported:

When SDN mechanism driver is used in Neutron (on our site we use
mlnx_sdn_assist but this issue isn’t limited just to this driver, we
hear about similar issues with at least three other SDN solutions) there
is no consistency checking applied to the fabric past the initial port
configuration. If there is an issue with the SDN layer after Neutron
issues the request to the SDN controller and the requested configuration
is not implemented appropriately, there is no way for Neutron to know
about this. Ideally such scenarios should not happen but the feedback
from operators indicates that these issues occasionally do happen for a
variety of reasons and when they happen the user impact is significant
as the state of neutron and SDN needs to be merged manually which is
generally non-trivial.

If SDN mechanism drivers are not used and the standard openvswitch based
networking is configured, neutron-openvswitch-agent periodically checks
the port configuration and enforces the desired state if needed.
Investigating if/how this could be applied to SDN in a general case
would probably be a logical first step.

It would be very valuable for the SDN-based cloud operators to be able
to:

Have neutron poll SDN to check the state of each of the ports and
Have neutron “push” the state of each port to make sure that the SDN state is consistent with neutron state
Ensure that each SDN solution supported with OpenStack provides support for those actions

Initially these actions could be triggered manually (or from a
monitoring system) and later on it would likely become a periodic task
adding self-healing capabilities to SDN-based OpenStack installations.

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1829449

Title:
  Implement consistency check and self-healing for SDN-managed fabrics

Status in neutron:
  New

Bug description:
  When SDN mechanism driver is used in Neutron (on our site we use
  mlnx_sdn_assist but this issue isn’t limited just to this driver, we
  hear about similar issues with at least three other SDN solutions)
  there is no consistency checking applied to the fabric past the
  initial port configuration. If there is an issue with the SDN layer
  after Neutron issues the request to the SDN controller and the
  requested configuration is not implemented appropriately, there is no
  way for Neutron to know about this. Ideally such scenarios should not
  happen but the feedback from operators indicates that these issues
  occasionally do happen for a variety of reasons and when they happen
  the user impact is significant as the state of neutron and SDN needs
  to be merged manually which is generally non-trivial.

  If SDN mechanism drivers are not used and the standard openvswitch
  based networking is configured, neutron-openvswitch-agent periodically
  checks the port configuration and enforces the desired state if
  needed. Investigating if/how this could be applied to SDN in a general
  case would probably be a logical first step.

  It would be very valuable for the SDN-based cloud operators to be able
  to:

  Have neutron poll SDN to check the state of each of the ports and
  Have neutron “push” the state of each port to make sure that the SDN state is consistent with neutron state
  Ensure that each SDN solution supported with OpenStack provides support for those actions

  Initially these actions could be triggered manually (or from a
  monitoring system) and later on it would likely become a periodic task
  adding self-healing capabilities to SDN-based OpenStack installations.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1829449/+subscriptions