yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #82788
[Bug 1881041] [NEW] OVN Router sending ARP instead of sending traffic to the gateway
Public bug reported:
Summary:
When a VM has a Floating IP, any attempt to reach a routed network
results in an ARP being sent instead of the traffic being sent to the
Gateway.
Description:
I have two VM's:
$ openstack server list -f yaml
- Flavor: ''
ID: f875fc7c-f743-4234-8ccb-c03f6ae66289
Image: Fedora_32
Name: fedora_no_fip
Networks: infra_external=172.20.10.201
Status: ACTIVE
- Flavor: ''
ID: 4dd45015-9ad6-4388-b458-3128cbdd784b
Image: Fedora_32
Name: fedora_test
Networks: infra_internal=192.168.10.102, 172.20.10.107
Status: ACTIVE
The one without the FIP can reach anything fine. For example, ping 1.1.1.1:
[root@overcloud-novacompute-1 ~]# tcpdump -i any host 172.20.10.201 -nevvv
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
22:52:13.970470 P fa:16:3e:47:ee:dd ethertype IPv4 (0x0800), length 100: (tos 0x0, ttl 64, id 59289, offset 0, flags [DF], proto ICMP (1), length 84)
172.20.10.201 > 1.1.1.1: ICMP echo request, id 1, seq 36, length 64
22:52:13.978619 P 00:e0:67:15:cc:2f ethertype 802.1Q (0x8100), length 104: vlan 4, p 0, ethertype IPv4, (tos 0x0, ttl 56, id 38296, offset 0, flags [none], proto ICMP (1), length 84)
1.1.1.1 > 172.20.10.201: ICMP echo reply, id 1, seq 36, length 64
But, when I try the same from the VM with the Floating IP, I can see that an ARP is being sent for 1.1.1.1:
[root@overcloud-novacompute-1 ~]# tcpdump -i any host 172.20.10.107 -nevvv
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
22:55:42.779383 B fa:16:3e:d7:80:3a ethertype 802.1Q (0x8100), length 48: vlan 4, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 1.1.1.1 tell 172.20.10.107, length 28
22:55:42.779476 Out fa:16:3e:d7:80:3a ethertype ARP (0x0806), length 44: Ethernet (len 6), IPv4 (len 4), Request who-has 1.1.1.1 tell 172.20.10.107, length 28
22:55:42.779510 Out fa:16:3e:d7:80:3a ethertype ARP (0x0806), length 44: Ethernet (len 6), IPv4 (len 4), Request who-has 1.1.1.1 tell 172.20.10.107, length 28
The router has the gateway network set:
$ openstack router show infra_r1 -f yaml
admin_state_up: true
availability_zone_hints: null
availability_zones: null
created_at: '2020-05-27T11:43:43Z'
description: ''
external_gateway_info:
enable_snat: true
external_fixed_ips:
- ip_address: 172.20.10.118
subnet_id: bf21b56a-65c4-49fb-b345-b804c0429167
network_id: 2561f8db-e1c8-4185-9056-0883686a8a53
flavor_id: null
id: 15c1b81d-b833-4d34-b622-4c6a0bd6c0d7
interfaces_info:
- ip_address: 192.168.10.1
port_id: 65a28088-761c-461c-912c-7d0a3781ab6b
subnet_id: 27382151-dbcc-4356-a080-47e181414e0b
location:
cloud: ''
project:
domain_id: null
domain_name: Default
id: 0e446e02e899455193635c877772fae7
name: admin
region_name: regionOne
zone: null
name: infra_r1
project_id: 0e446e02e899455193635c877772fae7
revision_number: 3
routes: []
status: ACTIVE
tags: []
updated_at: '2020-05-27T11:44:05Z'
Reproducer for me has been:
1. Deploy OpenStack with OVN DVR (Using TripleO, so the settings by default here: https://opendev.org/openstack/tripleo-heat-templates/src/branch/master/environments/services/neutron-ovn-dvr-ha.yaml)
2. Create an external network that is a VLAN:
$ openstack network show infra_external -f yaml
admin_state_up: true
availability_zone_hints: []
availability_zones: []
created_at: '2020-05-27T11:43:24Z'
description: ''
dns_domain: ''
id: 2561f8db-e1c8-4185-9056-0883686a8a53
ipv4_address_scope: null
ipv6_address_scope: null
is_default: false
is_vlan_transparent: null
location:
cloud: ''
project:
domain_id: null
domain_name: Default
id: 0e446e02e899455193635c877772fae7
name: admin
region_name: regionOne
zone: null
mtu: 9000
name: infra_external
port_security_enabled: true
project_id: 0e446e02e899455193635c877772fae7
provider:network_type: vlan
provider:physical_network: datacentre
provider:segmentation_id: 4
qos_policy_id: null
revision_number: 2
router:external: true
segments: null
shared: false
status: ACTIVE
subnets:
- bf21b56a-65c4-49fb-b345-b804c0429167
tags: []
updated_at: '2020-05-27T11:43:30Z'
3. Subnet with the corresponding details:
$ openstack subnet show infra_external_subnet -f yaml
allocation_pools:
- end: 172.20.10.250
start: 172.20.10.70
cidr: 172.20.0.0/16
created_at: '2020-05-27T11:43:30Z'
description: ''
dns_nameservers:
- 8.8.8.8
dns_publish_fixed_ip: null
enable_dhcp: true
gateway_ip: 172.20.0.254
host_routes: []
id: bf21b56a-65c4-49fb-b345-b804c0429167
ip_version: 4
ipv6_address_mode: null
ipv6_ra_mode: null
location:
cloud: ''
project:
domain_id: null
domain_name: Default
id: 0e446e02e899455193635c877772fae7
name: admin
region_name: regionOne
zone: null
name: infra_external_subnet
network_id: 2561f8db-e1c8-4185-9056-0883686a8a53
prefix_length: null
project_id: 0e446e02e899455193635c877772fae7
revision_number: 0
segment_id: null
service_types: []
subnetpool_id: null
tags: []
updated_at: '2020-05-27T11:43:30Z'
4. Internal network and a router, with the infra_external network set as the gateway (output provided earlier)
5. Create two VM's, one with a FIP and one directly attached to
infra_external
6. Try to ping anything that would need to be routed by the gateway for infra_external_subnet:
gateway_ip: 172.20.0.254
I can ping that gateway fine, it's just when the traffic would need to
be routed by 172.20.0.254 that we have an issue.
Versions:
$ cat /etc/redhat-release
CentOS Linux release 8.1.1911 (Core)
# rpm -qa | grep ovn
ovn-20.03.0-2.el8.x86_64
puppet-ovn-17.0.0-0.20200515234945.1d4c0ad.el8.noarch
ovn-host-20.03.0-2.el8.x86_64
$ rpm -qa | grep tripleo-heat-templates
openstack-tripleo-heat-templates-12.2.1-0.20200504123937.29a7fb8.el8.noarch
For the containers, I'm just using current-tripleo, but let me know if there is something else specific that I can get for you:
# podman image list | egrep 'ovn|neutron'
docker.io/tripleomaster/centos-binary-nova-novncproxy current-tripleo 544acd4346da 9 days ago 1.22 GB
docker.io/tripleomaster/centos-binary-neutron-server current-tripleo f19e459a94fd 9 days ago 1.19 GB
docker.io/tripleomaster/centos-binary-ovn-northd current-tripleo 8291433d7448 9 days ago 852 MB
docker.io/tripleomaster/centos-binary-ovn-northd pcmklatest 8291433d7448 9 days ago 852 MB
docker.io/tripleomaster/centos-binary-ovn-controller current-tripleo e8efc9a55bb2 9 days ago 734 MB
I'll share some ovn-trace outputs in the comments. This is getting a bit lengthy.
Expected Results:
OVN shouldn't send an ARP for a routed network.
Severity for me is not very high. It's just a home lab, but if there is
a wider issue it could be a problem.
** Affects: neutron
Importance: Undecided
Status: New
** Tags: ovn
** Attachment added: "Logic Flows"
https://bugs.launchpad.net/bugs/1881041/+attachment/5377591/+files/logic_flows
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1881041
Title:
OVN Router sending ARP instead of sending traffic to the gateway
Status in neutron:
New
Bug description:
Summary:
When a VM has a Floating IP, any attempt to reach a routed network
results in an ARP being sent instead of the traffic being sent to the
Gateway.
Description:
I have two VM's:
$ openstack server list -f yaml
- Flavor: ''
ID: f875fc7c-f743-4234-8ccb-c03f6ae66289
Image: Fedora_32
Name: fedora_no_fip
Networks: infra_external=172.20.10.201
Status: ACTIVE
- Flavor: ''
ID: 4dd45015-9ad6-4388-b458-3128cbdd784b
Image: Fedora_32
Name: fedora_test
Networks: infra_internal=192.168.10.102, 172.20.10.107
Status: ACTIVE
The one without the FIP can reach anything fine. For example, ping 1.1.1.1:
[root@overcloud-novacompute-1 ~]# tcpdump -i any host 172.20.10.201 -nevvv
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
22:52:13.970470 P fa:16:3e:47:ee:dd ethertype IPv4 (0x0800), length 100: (tos 0x0, ttl 64, id 59289, offset 0, flags [DF], proto ICMP (1), length 84)
172.20.10.201 > 1.1.1.1: ICMP echo request, id 1, seq 36, length 64
22:52:13.978619 P 00:e0:67:15:cc:2f ethertype 802.1Q (0x8100), length 104: vlan 4, p 0, ethertype IPv4, (tos 0x0, ttl 56, id 38296, offset 0, flags [none], proto ICMP (1), length 84)
1.1.1.1 > 172.20.10.201: ICMP echo reply, id 1, seq 36, length 64
But, when I try the same from the VM with the Floating IP, I can see that an ARP is being sent for 1.1.1.1:
[root@overcloud-novacompute-1 ~]# tcpdump -i any host 172.20.10.107 -nevvv
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
22:55:42.779383 B fa:16:3e:d7:80:3a ethertype 802.1Q (0x8100), length 48: vlan 4, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 1.1.1.1 tell 172.20.10.107, length 28
22:55:42.779476 Out fa:16:3e:d7:80:3a ethertype ARP (0x0806), length 44: Ethernet (len 6), IPv4 (len 4), Request who-has 1.1.1.1 tell 172.20.10.107, length 28
22:55:42.779510 Out fa:16:3e:d7:80:3a ethertype ARP (0x0806), length 44: Ethernet (len 6), IPv4 (len 4), Request who-has 1.1.1.1 tell 172.20.10.107, length 28
The router has the gateway network set:
$ openstack router show infra_r1 -f yaml
admin_state_up: true
availability_zone_hints: null
availability_zones: null
created_at: '2020-05-27T11:43:43Z'
description: ''
external_gateway_info:
enable_snat: true
external_fixed_ips:
- ip_address: 172.20.10.118
subnet_id: bf21b56a-65c4-49fb-b345-b804c0429167
network_id: 2561f8db-e1c8-4185-9056-0883686a8a53
flavor_id: null
id: 15c1b81d-b833-4d34-b622-4c6a0bd6c0d7
interfaces_info:
- ip_address: 192.168.10.1
port_id: 65a28088-761c-461c-912c-7d0a3781ab6b
subnet_id: 27382151-dbcc-4356-a080-47e181414e0b
location:
cloud: ''
project:
domain_id: null
domain_name: Default
id: 0e446e02e899455193635c877772fae7
name: admin
region_name: regionOne
zone: null
name: infra_r1
project_id: 0e446e02e899455193635c877772fae7
revision_number: 3
routes: []
status: ACTIVE
tags: []
updated_at: '2020-05-27T11:44:05Z'
Reproducer for me has been:
1. Deploy OpenStack with OVN DVR (Using TripleO, so the settings by default here: https://opendev.org/openstack/tripleo-heat-templates/src/branch/master/environments/services/neutron-ovn-dvr-ha.yaml)
2. Create an external network that is a VLAN:
$ openstack network show infra_external -f yaml
admin_state_up: true
availability_zone_hints: []
availability_zones: []
created_at: '2020-05-27T11:43:24Z'
description: ''
dns_domain: ''
id: 2561f8db-e1c8-4185-9056-0883686a8a53
ipv4_address_scope: null
ipv6_address_scope: null
is_default: false
is_vlan_transparent: null
location:
cloud: ''
project:
domain_id: null
domain_name: Default
id: 0e446e02e899455193635c877772fae7
name: admin
region_name: regionOne
zone: null
mtu: 9000
name: infra_external
port_security_enabled: true
project_id: 0e446e02e899455193635c877772fae7
provider:network_type: vlan
provider:physical_network: datacentre
provider:segmentation_id: 4
qos_policy_id: null
revision_number: 2
router:external: true
segments: null
shared: false
status: ACTIVE
subnets:
- bf21b56a-65c4-49fb-b345-b804c0429167
tags: []
updated_at: '2020-05-27T11:43:30Z'
3. Subnet with the corresponding details:
$ openstack subnet show infra_external_subnet -f yaml
allocation_pools:
- end: 172.20.10.250
start: 172.20.10.70
cidr: 172.20.0.0/16
created_at: '2020-05-27T11:43:30Z'
description: ''
dns_nameservers:
- 8.8.8.8
dns_publish_fixed_ip: null
enable_dhcp: true
gateway_ip: 172.20.0.254
host_routes: []
id: bf21b56a-65c4-49fb-b345-b804c0429167
ip_version: 4
ipv6_address_mode: null
ipv6_ra_mode: null
location:
cloud: ''
project:
domain_id: null
domain_name: Default
id: 0e446e02e899455193635c877772fae7
name: admin
region_name: regionOne
zone: null
name: infra_external_subnet
network_id: 2561f8db-e1c8-4185-9056-0883686a8a53
prefix_length: null
project_id: 0e446e02e899455193635c877772fae7
revision_number: 0
segment_id: null
service_types: []
subnetpool_id: null
tags: []
updated_at: '2020-05-27T11:43:30Z'
4. Internal network and a router, with the infra_external network set as the gateway (output provided earlier)
5. Create two VM's, one with a FIP and one directly attached to
infra_external
6. Try to ping anything that would need to be routed by the gateway for infra_external_subnet:
gateway_ip: 172.20.0.254
I can ping that gateway fine, it's just when the traffic would need to
be routed by 172.20.0.254 that we have an issue.
Versions:
$ cat /etc/redhat-release
CentOS Linux release 8.1.1911 (Core)
# rpm -qa | grep ovn
ovn-20.03.0-2.el8.x86_64
puppet-ovn-17.0.0-0.20200515234945.1d4c0ad.el8.noarch
ovn-host-20.03.0-2.el8.x86_64
$ rpm -qa | grep tripleo-heat-templates
openstack-tripleo-heat-templates-12.2.1-0.20200504123937.29a7fb8.el8.noarch
For the containers, I'm just using current-tripleo, but let me know if there is something else specific that I can get for you:
# podman image list | egrep 'ovn|neutron'
docker.io/tripleomaster/centos-binary-nova-novncproxy current-tripleo 544acd4346da 9 days ago 1.22 GB
docker.io/tripleomaster/centos-binary-neutron-server current-tripleo f19e459a94fd 9 days ago 1.19 GB
docker.io/tripleomaster/centos-binary-ovn-northd current-tripleo 8291433d7448 9 days ago 852 MB
docker.io/tripleomaster/centos-binary-ovn-northd pcmklatest 8291433d7448 9 days ago 852 MB
docker.io/tripleomaster/centos-binary-ovn-controller current-tripleo e8efc9a55bb2 9 days ago 734 MB
I'll share some ovn-trace outputs in the comments. This is getting a bit lengthy.
Expected Results:
OVN shouldn't send an ARP for a routed network.
Severity for me is not very high. It's just a home lab, but if there
is a wider issue it could be a problem.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1881041/+subscriptions