← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2004041] [NEW] Missing flows with ovs dvr after openvswitch restart

 

Public bug reported:

Certain flows are missing in a distributed openstack setup after restart of openvswitch.
I have tested this on openstack ussuri deployed with kolla-ansible on ubuntu bionic, so there is a chance that this has been either been fixed or is caused by specifics of the deployment.

## Steps to reproduce

There might be a simpler reproducer, but this is what I did:

* Setup a distributed openstack with at least one control node and two compute nodes
* Configure neutron with OVS and DVR
* Configure octavia with amphora driver
* Setup an external network as floating ip pool
* Create an instance with an http server
* Create a loadbalancer with an http listener/pool
* Add the instance as pool member to the loadbalancer
* Attach a floating IP to the loadbalancer's virtual IP
* Make sure that the loadbalancer amphora and the instance are on different compute nodes
* Ensure that you can make an http request, e.g.:

  ```
  # curl -I http://${FLOATING_IP}
  HTTP/1.1 200 OK
  Server: nginx/1.18.0 (Ubuntu)
  Date: Fri, 27 Jan 2023 15:00:00 GMT
  Content-Type: text/html
  Content-Length: 612
  Last-Modified: Fri, 27 Jan 2023 13:45:11 GMT
  ETag: "63d3d567-264"
  Accept-Ranges: bytes
  
    0   612    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  ```

* Restart openvswitch

  ```
  # docker restart openvswitch_vswitchd
  openvswitch_vswitchd
  ```

* Observe that the connection fails with, e.g.:

  ```
  # curl -I http://${FLOATING_IP}
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:00:02 --:--:--     0
  curl: (7) Failed to connect to ${FLOATING_IP} port 80: No route to host
  ```

* Connections will re-establish only after restarting neutron-
openvswitch-agent


## Flows before and after restart of openvswitch

Looking at the flows on the controller node on the tunnel bridge one can see, that flows are missing after restarting openvswitch:
```
# docker exec openvswitch_vswitchd ovs-ofctl dump-flows br-tun > before_ovs_restart.log
# docker restart openvswitch_vswitchd
openvswitch_vswitchd
# docker exec openvswitch_vswitchd ovs-ofctl dump-flows br-tun > after_ovs_restart.log
# awk '{print $3" "$(NF)}' < before_ovs_restart.log > before_ovs_restart_cleaned.log
# awk '{print $3" "$(NF)}' < after_ovs_restart.log > after_ovs_restart_cleaned.log
# diff before_ovs_restart_cleaned.log after_ovs_restart_cleaned.log
3,4d2
< table=0, actions=resubmit(,4)
< table=0, actions=resubmit(,4)
6,7d3
< table=1, actions=drop
< table=1, actions=mod_dl_src:fa:16:3f:56:bb:5a,resubmit(,2)
13d8
< table=4, actions=mod_vlan_vid:53,resubmit(,9)
20,22d14
< table=20, actions=strip_vlan,load:0x2ed->NXM_NX_TUN_ID[],output:22
< table=20, actions=strip_vlan,load:0x2ed->NXM_NX_TUN_ID[],output:23
< table=20, actions=load:0->NXM_OF_VLAN_TCI[],load:0x2ed->NXM_NX_TUN_ID[],output:22
24,25d15
< table=21, actions=load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]->NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]->NXM_OF_ARP_TPA[],load:0xfa163eb4cf96->NXM_NX_ARP_SHA[],load:0xa000165->NXM_OF_ARP_SPA[],move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],mod_dl_src:fa:16:3e:b4:cf:96,IN_PORT
< table=21, actions=load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]->NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]->NXM_OF_ARP_TPA[],load:0xfa163e77e67e->NXM_NX_ARP_SHA[],load:0xa0000a3->NXM_OF_ARP_SPA[],move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],mod_dl_src:fa:16:3e:77:e6:7e,IN_PORT
27,28d16
< table=22, actions=drop
< table=22, actions=strip_vlan,load:0x2ed->NXM_NX_TUN_ID[],output:22,output:23
```

Please let me know if you need more information. I also have a heat
stack which automates the openstack resource part of the reproducer, in
case this makes things easier.

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2004041

Title:
  Missing flows with ovs dvr after openvswitch restart

Status in neutron:
  New

Bug description:
  Certain flows are missing in a distributed openstack setup after restart of openvswitch.
  I have tested this on openstack ussuri deployed with kolla-ansible on ubuntu bionic, so there is a chance that this has been either been fixed or is caused by specifics of the deployment.

  ## Steps to reproduce

  There might be a simpler reproducer, but this is what I did:

  * Setup a distributed openstack with at least one control node and two compute nodes
  * Configure neutron with OVS and DVR
  * Configure octavia with amphora driver
  * Setup an external network as floating ip pool
  * Create an instance with an http server
  * Create a loadbalancer with an http listener/pool
  * Add the instance as pool member to the loadbalancer
  * Attach a floating IP to the loadbalancer's virtual IP
  * Make sure that the loadbalancer amphora and the instance are on different compute nodes
  * Ensure that you can make an http request, e.g.:

    ```
    # curl -I http://${FLOATING_IP}
    HTTP/1.1 200 OK
    Server: nginx/1.18.0 (Ubuntu)
    Date: Fri, 27 Jan 2023 15:00:00 GMT
    Content-Type: text/html
    Content-Length: 612
    Last-Modified: Fri, 27 Jan 2023 13:45:11 GMT
    ETag: "63d3d567-264"
    Accept-Ranges: bytes
    
      0   612    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
    ```

  * Restart openvswitch

    ```
    # docker restart openvswitch_vswitchd
    openvswitch_vswitchd
    ```

  * Observe that the connection fails with, e.g.:

    ```
    # curl -I http://${FLOATING_IP}
    % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                   Dload  Upload   Total   Spent    Left  Speed
    0     0    0     0    0     0      0      0 --:--:--  0:00:02 --:--:--     0
    curl: (7) Failed to connect to ${FLOATING_IP} port 80: No route to host
    ```

  * Connections will re-establish only after restarting neutron-
  openvswitch-agent

  
  ## Flows before and after restart of openvswitch

  Looking at the flows on the controller node on the tunnel bridge one can see, that flows are missing after restarting openvswitch:
  ```
  # docker exec openvswitch_vswitchd ovs-ofctl dump-flows br-tun > before_ovs_restart.log
  # docker restart openvswitch_vswitchd
  openvswitch_vswitchd
  # docker exec openvswitch_vswitchd ovs-ofctl dump-flows br-tun > after_ovs_restart.log
  # awk '{print $3" "$(NF)}' < before_ovs_restart.log > before_ovs_restart_cleaned.log
  # awk '{print $3" "$(NF)}' < after_ovs_restart.log > after_ovs_restart_cleaned.log
  # diff before_ovs_restart_cleaned.log after_ovs_restart_cleaned.log
  3,4d2
  < table=0, actions=resubmit(,4)
  < table=0, actions=resubmit(,4)
  6,7d3
  < table=1, actions=drop
  < table=1, actions=mod_dl_src:fa:16:3f:56:bb:5a,resubmit(,2)
  13d8
  < table=4, actions=mod_vlan_vid:53,resubmit(,9)
  20,22d14
  < table=20, actions=strip_vlan,load:0x2ed->NXM_NX_TUN_ID[],output:22
  < table=20, actions=strip_vlan,load:0x2ed->NXM_NX_TUN_ID[],output:23
  < table=20, actions=load:0->NXM_OF_VLAN_TCI[],load:0x2ed->NXM_NX_TUN_ID[],output:22
  24,25d15
  < table=21, actions=load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]->NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]->NXM_OF_ARP_TPA[],load:0xfa163eb4cf96->NXM_NX_ARP_SHA[],load:0xa000165->NXM_OF_ARP_SPA[],move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],mod_dl_src:fa:16:3e:b4:cf:96,IN_PORT
  < table=21, actions=load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]->NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]->NXM_OF_ARP_TPA[],load:0xfa163e77e67e->NXM_NX_ARP_SHA[],load:0xa0000a3->NXM_OF_ARP_SPA[],move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],mod_dl_src:fa:16:3e:77:e6:7e,IN_PORT
  27,28d16
  < table=22, actions=drop
  < table=22, actions=strip_vlan,load:0x2ed->NXM_NX_TUN_ID[],output:22,output:23
  ```

  Please let me know if you need more information. I also have a heat
  stack which automates the openstack resource part of the reproducer,
  in case this makes things easier.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2004041/+subscriptions



Follow ups