← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1864822] Re: Openvswitch Agent - Connexion openvswitch DB Broken

 

Reviewed:  https://review.opendev.org/721554
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=91f0bf3c8511bf3b0cc63746f767d8d4dce601bd
Submitter: Zuul
Branch:    master

commit 91f0bf3c8511bf3b0cc63746f767d8d4dce601bd
Author: Slawek Kaplonski <skaplons@xxxxxxxxxx>
Date:   Tue Apr 21 10:30:52 2020 +0200

    [DVR] Reconfigure re-created physical bridges for dvr routers
    
    In case when physical bridge is removed and created again it
    is initialized by neutron-ovs-agent.
    But if agent has enabled distributed routing, dvr related
    flows wasn't configured again and that lead to connectivity issues
    in case of DVR routers.
    
    This patch fixes it by adding configuration of dvr related flows
    if distributed routing is enabled in agent's configuration.
    
    It also adds reset list of phys_brs in dvr_agent. Without that there
    were different objects used in ovs agent and dvr_agent classes thus
    e.g. 2 various cookie ids were set on flows in physical bridge.
    This was also the same issue in case when openvswitch was restarted and
    all bridges were reconfigured.
    Now in such case there is correctly new cookie_id configured for all
    flows.
    
    Change-Id: I710f00f0f542bcf7fa2fc60800797b90f9f77e14
    Closes-Bug: #1864822


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1864822

Title:
  Openvswitch Agent - Connexion openvswitch DB Broken

Status in neutron:
  Fix Released

Bug description:
  Hi all,

  We have deployed more OpenStack plateform in my company.
  We used kolla ansible to deploy our plateforms.
  Here is the configuration that we applied : 
  kolla_base_distro: "centos"
  kolla_install_type : "binary"
  openstack_version : "stein"

  Neutron architecture : 
  HA l3 enable
  DVR enable
  SNAT Enabled
  multiple vlan provider : True

  Note: Our plateforms are multi-region

  Recently, we have upgraded a master region from rocky to stein with kolla ansible upgrade procedure.
  Since ugrade, sometimes openvswitch agent lost connexion to ovsdb.
  We have found this error in neutron-openvswitch-agent.log : "tcp:127.0.0.1:6640: send error: Broken pipe".
  And we have found this errors in ovsdb-server.log : 
  2020-02-24T23:13:22.644Z|00009|reconnect|ERR|tcp:127.0.0.1:50260: no response to inactivity probe after 5 seconds, disconnecting
  2020-02-25T04:10:55.893Z|00010|reconnect|ERR|tcp:127.0.0.1:58544: no response to inactivity probe after 5 seconds, disconnecting
  2020-02-25T07:21:12.301Z|00011|reconnect|ERR|tcp:127.0.0.1:34918: no response to inactivity probe after 5 seconds, disconnecting
  2020-02-25T09:21:45.533Z|00012|reconnect|ERR|tcp:127.0.0.1:37782: no response to inactivity probe after 5 seconds, disconnecting

  When we experience this issue, all "NORMAL" type flows inside br-ex doesn't get out.
  Example of flows stuck: 
  (neutron-openvswitch-agent)[root@cnp69s12p07 /]# ovs-ofctl dump-flows br-ex | grep NORMAL
   cookie=0x7adbd675f988912b, duration=72705.077s, table=0, n_packets=185, n_bytes=16024, idle_age=65534, hard_age=65534, priority=0 actions=NORMAL
   cookie=0x7adbd675f988912b, duration=72695.007s, table=2, n_packets=11835702, n_bytes=5166123797, idle_age=0, hard_age=65534, priority=4,in_port=5,dl_vlan=1 actions=mod_vlan_vid:12,NORMAL
   cookie=0x7adbd675f988912b, duration=72694.928s, table=2, n_packets=4133243, n_bytes=349654412, idle_age=0, hard_age=65534, priority=4,in_port=5,dl_vlan=9 actions=mod_vlan_vid:18,NORMAL

  Workaround to solve this issue: 
  - stop openvswitch_db openvswitch_vswitchd neutron_openvswitch_agent neutron_l3_agent (containers)
  - start containers:  openvswitch_db openvswitch_vswitchd
  - start neutron_l3_agent neutron_openvswitch_agent

  Note: we have keep ovs connection timeout options by default : 
  - of_connect_timeout: 300
  - of_request_timeout: 300
  - of_inactivity_probe: 10

  
  Thank you in advance for your help.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1864822/+subscriptions


References