← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1881041] [NEW] OVN Router sending ARP instead of sending traffic to the gateway

 

Public bug reported:

Summary:

When a VM has a Floating IP, any attempt to reach a routed network
results in an ARP being sent instead of the traffic being sent to the
Gateway.

Description:
I have two VM's:

$ openstack server list -f yaml
- Flavor: ''
  ID: f875fc7c-f743-4234-8ccb-c03f6ae66289
  Image: Fedora_32
  Name: fedora_no_fip
  Networks: infra_external=172.20.10.201
  Status: ACTIVE
- Flavor: ''
  ID: 4dd45015-9ad6-4388-b458-3128cbdd784b
  Image: Fedora_32
  Name: fedora_test
  Networks: infra_internal=192.168.10.102, 172.20.10.107
  Status: ACTIVE


The one without the FIP can reach anything fine. For example, ping 1.1.1.1:
[root@overcloud-novacompute-1 ~]# tcpdump -i any host 172.20.10.201 -nevvv                                                                                                                                                                  
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
22:52:13.970470   P fa:16:3e:47:ee:dd ethertype IPv4 (0x0800), length 100: (tos 0x0, ttl 64, id 59289, offset 0, flags [DF], proto ICMP (1), length 84)                                                                                     
    172.20.10.201 > 1.1.1.1: ICMP echo request, id 1, seq 36, length 64
22:52:13.978619   P 00:e0:67:15:cc:2f ethertype 802.1Q (0x8100), length 104: vlan 4, p 0, ethertype IPv4, (tos 0x0, ttl 56, id 38296, offset 0, flags [none], proto ICMP (1), length 84)                                                    
    1.1.1.1 > 172.20.10.201: ICMP echo reply, id 1, seq 36, length 64


But, when I try the same from the VM with the Floating IP, I can see that an ARP is being sent for 1.1.1.1:
[root@overcloud-novacompute-1 ~]# tcpdump -i any host 172.20.10.107 -nevvv
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
22:55:42.779383   B fa:16:3e:d7:80:3a ethertype 802.1Q (0x8100), length 48: vlan 4, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 1.1.1.1 tell 172.20.10.107, length 28
22:55:42.779476 Out fa:16:3e:d7:80:3a ethertype ARP (0x0806), length 44: Ethernet (len 6), IPv4 (len 4), Request who-has 1.1.1.1 tell 172.20.10.107, length 28
22:55:42.779510 Out fa:16:3e:d7:80:3a ethertype ARP (0x0806), length 44: Ethernet (len 6), IPv4 (len 4), Request who-has 1.1.1.1 tell 172.20.10.107, length 28


The router has the gateway network set:
$ openstack router show infra_r1 -f yaml
admin_state_up: true
availability_zone_hints: null
availability_zones: null
created_at: '2020-05-27T11:43:43Z'
description: ''
external_gateway_info:
  enable_snat: true
  external_fixed_ips:
  - ip_address: 172.20.10.118
    subnet_id: bf21b56a-65c4-49fb-b345-b804c0429167
  network_id: 2561f8db-e1c8-4185-9056-0883686a8a53
flavor_id: null
id: 15c1b81d-b833-4d34-b622-4c6a0bd6c0d7
interfaces_info:
- ip_address: 192.168.10.1
  port_id: 65a28088-761c-461c-912c-7d0a3781ab6b
  subnet_id: 27382151-dbcc-4356-a080-47e181414e0b
location:
  cloud: ''
  project:
    domain_id: null
    domain_name: Default
    id: 0e446e02e899455193635c877772fae7
    name: admin
  region_name: regionOne
  zone: null
name: infra_r1
project_id: 0e446e02e899455193635c877772fae7
revision_number: 3
routes: []
status: ACTIVE
tags: []
updated_at: '2020-05-27T11:44:05Z'


Reproducer for me has been:
1. Deploy OpenStack with OVN DVR (Using TripleO, so the settings by default here: https://opendev.org/openstack/tripleo-heat-templates/src/branch/master/environments/services/neutron-ovn-dvr-ha.yaml)
2. Create an external network that is a VLAN:
$ openstack network show infra_external -f yaml
admin_state_up: true
availability_zone_hints: []
availability_zones: []
created_at: '2020-05-27T11:43:24Z'
description: ''
dns_domain: ''
id: 2561f8db-e1c8-4185-9056-0883686a8a53
ipv4_address_scope: null
ipv6_address_scope: null
is_default: false
is_vlan_transparent: null
location:
  cloud: ''
  project:
    domain_id: null
    domain_name: Default
    id: 0e446e02e899455193635c877772fae7
    name: admin
  region_name: regionOne
  zone: null
mtu: 9000
name: infra_external
port_security_enabled: true
project_id: 0e446e02e899455193635c877772fae7
provider:network_type: vlan
provider:physical_network: datacentre
provider:segmentation_id: 4
qos_policy_id: null
revision_number: 2
router:external: true
segments: null
shared: false
status: ACTIVE
subnets:
- bf21b56a-65c4-49fb-b345-b804c0429167
tags: []
updated_at: '2020-05-27T11:43:30Z'

3. Subnet with the corresponding details:
$ openstack subnet show infra_external_subnet -f yaml
allocation_pools:
- end: 172.20.10.250
  start: 172.20.10.70
cidr: 172.20.0.0/16
created_at: '2020-05-27T11:43:30Z'
description: ''
dns_nameservers:
- 8.8.8.8
dns_publish_fixed_ip: null
enable_dhcp: true
gateway_ip: 172.20.0.254
host_routes: []
id: bf21b56a-65c4-49fb-b345-b804c0429167
ip_version: 4
ipv6_address_mode: null
ipv6_ra_mode: null
location:
  cloud: ''
  project:
    domain_id: null
    domain_name: Default
    id: 0e446e02e899455193635c877772fae7
    name: admin
  region_name: regionOne
  zone: null
name: infra_external_subnet
network_id: 2561f8db-e1c8-4185-9056-0883686a8a53
prefix_length: null
project_id: 0e446e02e899455193635c877772fae7
revision_number: 0
segment_id: null
service_types: []
subnetpool_id: null
tags: []
updated_at: '2020-05-27T11:43:30Z'


4. Internal network and a router, with the infra_external network set as the gateway (output provided earlier)

5. Create two VM's, one with a FIP and one directly attached to
infra_external

6. Try to ping anything that would need to be routed by the gateway for infra_external_subnet:
gateway_ip: 172.20.0.254

I can ping that gateway fine, it's just when the traffic would need to
be routed by 172.20.0.254 that we have an issue.

Versions:
$ cat /etc/redhat-release 
CentOS Linux release 8.1.1911 (Core)

# rpm -qa | grep ovn
ovn-20.03.0-2.el8.x86_64
puppet-ovn-17.0.0-0.20200515234945.1d4c0ad.el8.noarch
ovn-host-20.03.0-2.el8.x86_64

$ rpm -qa | grep tripleo-heat-templates
openstack-tripleo-heat-templates-12.2.1-0.20200504123937.29a7fb8.el8.noarch

For the containers, I'm just using current-tripleo, but let me know if there is something else specific that I can get for you:
# podman image list | egrep 'ovn|neutron'
docker.io/tripleomaster/centos-binary-nova-novncproxy      current-tripleo   544acd4346da   9 days ago   1.22 GB
docker.io/tripleomaster/centos-binary-neutron-server       current-tripleo   f19e459a94fd   9 days ago   1.19 GB
docker.io/tripleomaster/centos-binary-ovn-northd           current-tripleo   8291433d7448   9 days ago   852 MB
docker.io/tripleomaster/centos-binary-ovn-northd           pcmklatest        8291433d7448   9 days ago   852 MB
docker.io/tripleomaster/centos-binary-ovn-controller       current-tripleo   e8efc9a55bb2   9 days ago   734 MB


I'll share some ovn-trace outputs in the comments. This is getting a bit lengthy. 


Expected Results:
OVN shouldn't send an ARP for a routed network. 

Severity for me is not very high. It's just a home lab, but if there is
a wider issue it could be a problem.

** Affects: neutron
     Importance: Undecided
         Status: New


** Tags: ovn

** Attachment added: "Logic Flows"
   https://bugs.launchpad.net/bugs/1881041/+attachment/5377591/+files/logic_flows

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1881041

Title:
  OVN Router sending ARP instead of sending traffic to the gateway

Status in neutron:
  New

Bug description:
  Summary:

  When a VM has a Floating IP, any attempt to reach a routed network
  results in an ARP being sent instead of the traffic being sent to the
  Gateway.

  Description:
  I have two VM's:

  $ openstack server list -f yaml
  - Flavor: ''
    ID: f875fc7c-f743-4234-8ccb-c03f6ae66289
    Image: Fedora_32
    Name: fedora_no_fip
    Networks: infra_external=172.20.10.201
    Status: ACTIVE
  - Flavor: ''
    ID: 4dd45015-9ad6-4388-b458-3128cbdd784b
    Image: Fedora_32
    Name: fedora_test
    Networks: infra_internal=192.168.10.102, 172.20.10.107
    Status: ACTIVE

  
  The one without the FIP can reach anything fine. For example, ping 1.1.1.1:
  [root@overcloud-novacompute-1 ~]# tcpdump -i any host 172.20.10.201 -nevvv                                                                                                                                                                  
  tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
  22:52:13.970470   P fa:16:3e:47:ee:dd ethertype IPv4 (0x0800), length 100: (tos 0x0, ttl 64, id 59289, offset 0, flags [DF], proto ICMP (1), length 84)                                                                                     
      172.20.10.201 > 1.1.1.1: ICMP echo request, id 1, seq 36, length 64
  22:52:13.978619   P 00:e0:67:15:cc:2f ethertype 802.1Q (0x8100), length 104: vlan 4, p 0, ethertype IPv4, (tos 0x0, ttl 56, id 38296, offset 0, flags [none], proto ICMP (1), length 84)                                                    
      1.1.1.1 > 172.20.10.201: ICMP echo reply, id 1, seq 36, length 64

  
  But, when I try the same from the VM with the Floating IP, I can see that an ARP is being sent for 1.1.1.1:
  [root@overcloud-novacompute-1 ~]# tcpdump -i any host 172.20.10.107 -nevvv
  tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
  22:55:42.779383   B fa:16:3e:d7:80:3a ethertype 802.1Q (0x8100), length 48: vlan 4, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 1.1.1.1 tell 172.20.10.107, length 28
  22:55:42.779476 Out fa:16:3e:d7:80:3a ethertype ARP (0x0806), length 44: Ethernet (len 6), IPv4 (len 4), Request who-has 1.1.1.1 tell 172.20.10.107, length 28
  22:55:42.779510 Out fa:16:3e:d7:80:3a ethertype ARP (0x0806), length 44: Ethernet (len 6), IPv4 (len 4), Request who-has 1.1.1.1 tell 172.20.10.107, length 28

  
  The router has the gateway network set:
  $ openstack router show infra_r1 -f yaml
  admin_state_up: true
  availability_zone_hints: null
  availability_zones: null
  created_at: '2020-05-27T11:43:43Z'
  description: ''
  external_gateway_info:
    enable_snat: true
    external_fixed_ips:
    - ip_address: 172.20.10.118
      subnet_id: bf21b56a-65c4-49fb-b345-b804c0429167
    network_id: 2561f8db-e1c8-4185-9056-0883686a8a53
  flavor_id: null
  id: 15c1b81d-b833-4d34-b622-4c6a0bd6c0d7
  interfaces_info:
  - ip_address: 192.168.10.1
    port_id: 65a28088-761c-461c-912c-7d0a3781ab6b
    subnet_id: 27382151-dbcc-4356-a080-47e181414e0b
  location:
    cloud: ''
    project:
      domain_id: null
      domain_name: Default
      id: 0e446e02e899455193635c877772fae7
      name: admin
    region_name: regionOne
    zone: null
  name: infra_r1
  project_id: 0e446e02e899455193635c877772fae7
  revision_number: 3
  routes: []
  status: ACTIVE
  tags: []
  updated_at: '2020-05-27T11:44:05Z'

  
  Reproducer for me has been:
  1. Deploy OpenStack with OVN DVR (Using TripleO, so the settings by default here: https://opendev.org/openstack/tripleo-heat-templates/src/branch/master/environments/services/neutron-ovn-dvr-ha.yaml)
  2. Create an external network that is a VLAN:
  $ openstack network show infra_external -f yaml
  admin_state_up: true
  availability_zone_hints: []
  availability_zones: []
  created_at: '2020-05-27T11:43:24Z'
  description: ''
  dns_domain: ''
  id: 2561f8db-e1c8-4185-9056-0883686a8a53
  ipv4_address_scope: null
  ipv6_address_scope: null
  is_default: false
  is_vlan_transparent: null
  location:
    cloud: ''
    project:
      domain_id: null
      domain_name: Default
      id: 0e446e02e899455193635c877772fae7
      name: admin
    region_name: regionOne
    zone: null
  mtu: 9000
  name: infra_external
  port_security_enabled: true
  project_id: 0e446e02e899455193635c877772fae7
  provider:network_type: vlan
  provider:physical_network: datacentre
  provider:segmentation_id: 4
  qos_policy_id: null
  revision_number: 2
  router:external: true
  segments: null
  shared: false
  status: ACTIVE
  subnets:
  - bf21b56a-65c4-49fb-b345-b804c0429167
  tags: []
  updated_at: '2020-05-27T11:43:30Z'

  3. Subnet with the corresponding details:
  $ openstack subnet show infra_external_subnet -f yaml
  allocation_pools:
  - end: 172.20.10.250
    start: 172.20.10.70
  cidr: 172.20.0.0/16
  created_at: '2020-05-27T11:43:30Z'
  description: ''
  dns_nameservers:
  - 8.8.8.8
  dns_publish_fixed_ip: null
  enable_dhcp: true
  gateway_ip: 172.20.0.254
  host_routes: []
  id: bf21b56a-65c4-49fb-b345-b804c0429167
  ip_version: 4
  ipv6_address_mode: null
  ipv6_ra_mode: null
  location:
    cloud: ''
    project:
      domain_id: null
      domain_name: Default
      id: 0e446e02e899455193635c877772fae7
      name: admin
    region_name: regionOne
    zone: null
  name: infra_external_subnet
  network_id: 2561f8db-e1c8-4185-9056-0883686a8a53
  prefix_length: null
  project_id: 0e446e02e899455193635c877772fae7
  revision_number: 0
  segment_id: null
  service_types: []
  subnetpool_id: null
  tags: []
  updated_at: '2020-05-27T11:43:30Z'

  
  4. Internal network and a router, with the infra_external network set as the gateway (output provided earlier)

  5. Create two VM's, one with a FIP and one directly attached to
  infra_external

  6. Try to ping anything that would need to be routed by the gateway for infra_external_subnet:
  gateway_ip: 172.20.0.254

  I can ping that gateway fine, it's just when the traffic would need to
  be routed by 172.20.0.254 that we have an issue.

  Versions:
  $ cat /etc/redhat-release 
  CentOS Linux release 8.1.1911 (Core)

  # rpm -qa | grep ovn
  ovn-20.03.0-2.el8.x86_64
  puppet-ovn-17.0.0-0.20200515234945.1d4c0ad.el8.noarch
  ovn-host-20.03.0-2.el8.x86_64

  $ rpm -qa | grep tripleo-heat-templates
  openstack-tripleo-heat-templates-12.2.1-0.20200504123937.29a7fb8.el8.noarch

  For the containers, I'm just using current-tripleo, but let me know if there is something else specific that I can get for you:
  # podman image list | egrep 'ovn|neutron'
  docker.io/tripleomaster/centos-binary-nova-novncproxy      current-tripleo   544acd4346da   9 days ago   1.22 GB
  docker.io/tripleomaster/centos-binary-neutron-server       current-tripleo   f19e459a94fd   9 days ago   1.19 GB
  docker.io/tripleomaster/centos-binary-ovn-northd           current-tripleo   8291433d7448   9 days ago   852 MB
  docker.io/tripleomaster/centos-binary-ovn-northd           pcmklatest        8291433d7448   9 days ago   852 MB
  docker.io/tripleomaster/centos-binary-ovn-controller       current-tripleo   e8efc9a55bb2   9 days ago   734 MB

  
  I'll share some ovn-trace outputs in the comments. This is getting a bit lengthy. 

  
  Expected Results:
  OVN shouldn't send an ARP for a routed network. 

  Severity for me is not very high. It's just a home lab, but if there
  is a wider issue it could be a problem.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1881041/+subscriptions