← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1864822] [NEW] Openvswitch Agent - connexion openvswitch DB Broken

 

Public bug reported:

Hi all,

We have deployed more OpenStack plateform in my company.
We used kolla ansible to deploy our plateforms.
Here is the configuration that we applied : 
kolla_base_distro: "centos"
kolla_install_type : "binary"
openstack_version : "stein"

Neutron architecture : 
HA l3 enable
DVR enable
SNAT Enabled
multiple vlan provider : True

Note: Our plateforms are multi-region

Recently, we have upgraded a master region from rocky to stein with kolla ansible upgrade procedure.
Since ugrade, sometimes openvswitch agent lost connexion to ovsdb.
We have found this error in neutron-openvswitch-agent.log : "tcp:127.0.0.1:6640: send error: Broken pipe".
And we have found this errors in ovsdb-server.log : 
2020-02-24T23:13:22.644Z|00009|reconnect|ERR|tcp:127.0.0.1:50260: no response to inactivity probe after 5 seconds, disconnecting
2020-02-25T04:10:55.893Z|00010|reconnect|ERR|tcp:127.0.0.1:58544: no response to inactivity probe after 5 seconds, disconnecting
2020-02-25T07:21:12.301Z|00011|reconnect|ERR|tcp:127.0.0.1:34918: no response to inactivity probe after 5 seconds, disconnecting
2020-02-25T09:21:45.533Z|00012|reconnect|ERR|tcp:127.0.0.1:37782: no response to inactivity probe after 5 seconds, disconnecting

When we experience this issue, all "NORMAL" type flows inside br-ex doesn't get out.
Example of flows stuck: 
(neutron-openvswitch-agent)[root@cnp69s12p07 /]# ovs-ofctl dump-flows br-ex | grep NORMAL
 cookie=0x7adbd675f988912b, duration=72705.077s, table=0, n_packets=185, n_bytes=16024, idle_age=65534, hard_age=65534, priority=0 actions=NORMAL
 cookie=0x7adbd675f988912b, duration=72695.007s, table=2, n_packets=11835702, n_bytes=5166123797, idle_age=0, hard_age=65534, priority=4,in_port=5,dl_vlan=1 actions=mod_vlan_vid:12,NORMAL
 cookie=0x7adbd675f988912b, duration=72694.928s, table=2, n_packets=4133243, n_bytes=349654412, idle_age=0, hard_age=65534, priority=4,in_port=5,dl_vlan=9 actions=mod_vlan_vid:18,NORMAL

Workaround to solve this issue: 
- stop openvswitch_db openvswitch_vswitchd neutron_openvswitch_agent neutron_l3_agent (containers)
- start containers:  openvswitch_db openvswitch_vswitchd
- start neutron_l3_agent neutron_openvswitch_agent

Note: we have keep ovs connection timeout options by default : 
- of_connect_timeout: 300
- of_request_timeout: 300
- of_inactivity_probe: 10


Thank you in advance for your help.

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1864822

Title:
  Openvswitch Agent - connexion openvswitch DB Broken

Status in neutron:
  New

Bug description:
  Hi all,

  We have deployed more OpenStack plateform in my company.
  We used kolla ansible to deploy our plateforms.
  Here is the configuration that we applied : 
  kolla_base_distro: "centos"
  kolla_install_type : "binary"
  openstack_version : "stein"

  Neutron architecture : 
  HA l3 enable
  DVR enable
  SNAT Enabled
  multiple vlan provider : True

  Note: Our plateforms are multi-region

  Recently, we have upgraded a master region from rocky to stein with kolla ansible upgrade procedure.
  Since ugrade, sometimes openvswitch agent lost connexion to ovsdb.
  We have found this error in neutron-openvswitch-agent.log : "tcp:127.0.0.1:6640: send error: Broken pipe".
  And we have found this errors in ovsdb-server.log : 
  2020-02-24T23:13:22.644Z|00009|reconnect|ERR|tcp:127.0.0.1:50260: no response to inactivity probe after 5 seconds, disconnecting
  2020-02-25T04:10:55.893Z|00010|reconnect|ERR|tcp:127.0.0.1:58544: no response to inactivity probe after 5 seconds, disconnecting
  2020-02-25T07:21:12.301Z|00011|reconnect|ERR|tcp:127.0.0.1:34918: no response to inactivity probe after 5 seconds, disconnecting
  2020-02-25T09:21:45.533Z|00012|reconnect|ERR|tcp:127.0.0.1:37782: no response to inactivity probe after 5 seconds, disconnecting

  When we experience this issue, all "NORMAL" type flows inside br-ex doesn't get out.
  Example of flows stuck: 
  (neutron-openvswitch-agent)[root@cnp69s12p07 /]# ovs-ofctl dump-flows br-ex | grep NORMAL
   cookie=0x7adbd675f988912b, duration=72705.077s, table=0, n_packets=185, n_bytes=16024, idle_age=65534, hard_age=65534, priority=0 actions=NORMAL
   cookie=0x7adbd675f988912b, duration=72695.007s, table=2, n_packets=11835702, n_bytes=5166123797, idle_age=0, hard_age=65534, priority=4,in_port=5,dl_vlan=1 actions=mod_vlan_vid:12,NORMAL
   cookie=0x7adbd675f988912b, duration=72694.928s, table=2, n_packets=4133243, n_bytes=349654412, idle_age=0, hard_age=65534, priority=4,in_port=5,dl_vlan=9 actions=mod_vlan_vid:18,NORMAL

  Workaround to solve this issue: 
  - stop openvswitch_db openvswitch_vswitchd neutron_openvswitch_agent neutron_l3_agent (containers)
  - start containers:  openvswitch_db openvswitch_vswitchd
  - start neutron_l3_agent neutron_openvswitch_agent

  Note: we have keep ovs connection timeout options by default : 
  - of_connect_timeout: 300
  - of_request_timeout: 300
  - of_inactivity_probe: 10

  
  Thank you in advance for your help.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1864822/+subscriptions


Follow ups