← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2004041] Re: Missing flows with ovs dvr after openvswitch restart

 

Reviewed:  https://review.opendev.org/c/openstack/neutron/+/872265
Committed: https://opendev.org/openstack/neutron/commit/7573fca58c147eddddbfff6eebc3554fcdd23306
Submitter: "Zuul (22348)"
Branch:    master

commit 7573fca58c147eddddbfff6eebc3554fcdd23306
Author: LIU Yulong <i@xxxxxxxxxxxx>
Date:   Tue Jan 31 16:08:34 2023 +0800

    Notify neutron-server ovs is restarted
    
    If openvswitch is restarted, try to notify neutron-server
    that to refresh tunnel flows for every ports.
    
    Closes-Bug: #2004041
    Change-Id: Iba0ae947e3595674e63b998826daae2582bb7668


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2004041

Title:
  Missing flows with ovs dvr after openvswitch restart

Status in neutron:
  Fix Released

Bug description:
  Certain flows are missing in a distributed openstack setup after restart of openvswitch.
  I have tested this on openstack ussuri deployed with kolla-ansible on ubuntu bionic, so there is a chance that this has been either been fixed or is caused by specifics of the deployment.

  ## Steps to reproduce

  There might be a simpler reproducer, but this is what I did:

  * Setup a distributed openstack with at least one control node and two compute nodes
  * Configure neutron with OVS and DVR
  * Configure octavia with amphora driver
  * Setup an external network as floating ip pool
  * Create an instance with an http server
  * Create a loadbalancer with an http listener/pool
  * Add the instance as pool member to the loadbalancer
  * Attach a floating IP to the loadbalancer's virtual IP
  * Make sure that the loadbalancer amphora and the instance are on different compute nodes
  * Ensure that you can make an http request, e.g.:

    ```
    # curl -I http://${FLOATING_IP}
    HTTP/1.1 200 OK
    Server: nginx/1.18.0 (Ubuntu)
    Date: Fri, 27 Jan 2023 15:00:00 GMT
    Content-Type: text/html
    Content-Length: 612
    Last-Modified: Fri, 27 Jan 2023 13:45:11 GMT
    ETag: "63d3d567-264"
    Accept-Ranges: bytes
    
      0   612    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
    ```

  * Restart openvswitch

    ```
    # docker restart openvswitch_vswitchd
    openvswitch_vswitchd
    ```

  * Observe that the connection fails with, e.g.:

    ```
    # curl -I http://${FLOATING_IP}
    % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                   Dload  Upload   Total   Spent    Left  Speed
    0     0    0     0    0     0      0      0 --:--:--  0:00:02 --:--:--     0
    curl: (7) Failed to connect to ${FLOATING_IP} port 80: No route to host
    ```

  * Connections will re-establish only after restarting neutron-
  openvswitch-agent

  
  ## Flows before and after restart of openvswitch

  Looking at the flows on the controller node on the tunnel bridge one can see, that flows are missing after restarting openvswitch:
  ```
  # docker exec openvswitch_vswitchd ovs-ofctl dump-flows br-tun > before_ovs_restart.log
  # docker restart openvswitch_vswitchd
  openvswitch_vswitchd
  # docker exec openvswitch_vswitchd ovs-ofctl dump-flows br-tun > after_ovs_restart.log
  # awk '{print $3" "$(NF)}' < before_ovs_restart.log > before_ovs_restart_cleaned.log
  # awk '{print $3" "$(NF)}' < after_ovs_restart.log > after_ovs_restart_cleaned.log
  # diff before_ovs_restart_cleaned.log after_ovs_restart_cleaned.log
  3,4d2
  < table=0, actions=resubmit(,4)
  < table=0, actions=resubmit(,4)
  6,7d3
  < table=1, actions=drop
  < table=1, actions=mod_dl_src:fa:16:3f:56:bb:5a,resubmit(,2)
  13d8
  < table=4, actions=mod_vlan_vid:53,resubmit(,9)
  20,22d14
  < table=20, actions=strip_vlan,load:0x2ed->NXM_NX_TUN_ID[],output:22
  < table=20, actions=strip_vlan,load:0x2ed->NXM_NX_TUN_ID[],output:23
  < table=20, actions=load:0->NXM_OF_VLAN_TCI[],load:0x2ed->NXM_NX_TUN_ID[],output:22
  24,25d15
  < table=21, actions=load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]->NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]->NXM_OF_ARP_TPA[],load:0xfa163eb4cf96->NXM_NX_ARP_SHA[],load:0xa000165->NXM_OF_ARP_SPA[],move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],mod_dl_src:fa:16:3e:b4:cf:96,IN_PORT
  < table=21, actions=load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]->NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]->NXM_OF_ARP_TPA[],load:0xfa163e77e67e->NXM_NX_ARP_SHA[],load:0xa0000a3->NXM_OF_ARP_SPA[],move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],mod_dl_src:fa:16:3e:77:e6:7e,IN_PORT
  27,28d16
  < table=22, actions=drop
  < table=22, actions=strip_vlan,load:0x2ed->NXM_NX_TUN_ID[],output:22,output:23
  ```

  Please let me know if you need more information. I also have a heat
  stack which automates the openstack resource part of the reproducer,
  in case this makes things easier.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2004041/+subscriptions



References