yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #43438
[Bug 1526818] [NEW] Incorrect and excess ARP responses in tenant subnets
Public bug reported:
We are facing a very strange behaviour of ARP in tenant networks,
causing Windows guests to incorrectly decline DHCP addresses. These VMs
apparently do an ARP request for the address they have been offered,
discarding them in case a different MAC is reporting to own that IP
already.
We are using openvswitch-agent with ml2 plugin.
Investigating this issue using Linux guests. Please look at the
following example. A VM with the fixed-ip 192.168.1.15 reports the
following ARP cache:
root@michael-test2:~# arp
Address HWtype HWaddress Flags Mask Iface
host-192-168-1-2.openst ether fa:16:3e:de:ab:ea C eth0
192.168.1.13 ether a6:b2:dc:d8:39:c1 C eth0
192.168.1.119 (incomplete) eth0
host-192-168-1-20.opens ether fa:16:3e:76:43:ce C eth0
host-192-168-1-19.opens ether fa:16:3e:0d:a6:0b C eth0
host-192-168-1-1.openst ether fa:16:3e:2a:81:ff C eth0
192.168.1.14 ether 0e:bf:04:b7:ed:52 C eth0
Both 192.168.1.13 and 192.168.1.14 do not exist in this subnet, and their MAC addresses a6:b2:dc:d8:39:c1 and 0e:bf:04:b7:ed:52 actually belong to other instance qbr* and qvb* devices, living on their respective hypervisor hosts!
Looking at 0e:bf:04:b7:ed:52, for example, yields
# ip link list | grep -C1 -e 0e:bf:04:b7:ed:52
59: qbr9ac24ac1-e1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
link/ether 0e:bf:04:b7:ed:52 brd ff:ff:ff:ff:ff:ff
60: qvo9ac24ac1-e1: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UP mode DEFAULT group default qlen 1000
--
61: qvb9ac24ac1-e1: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master qbr9ac24ac1-e1 state UP mode DEFAULT group default qlen 1000
link/ether 0e:bf:04:b7:ed:52 brd ff:ff:ff:ff:ff:ff
62: tap9ac24ac1-e1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master qbr9ac24ac1-e1 state UNKNOWN mode DEFAULT group default qlen 500
on the compute node. Using tcpdump on qbr9ac24ac1-e1 on the host and
triggering a fresh ARM lookup from the guest results in
# tcpdump -i qbr9ac24ac1-e1 -vv -l | grep ARP
tcpdump: WARNING: qbr9ac24ac1-e1: no IPv4 address assigned
tcpdump: listening on qbr9ac24ac1-e1, link-type EN10MB (Ethernet), capture size 65535 bytes
14:00:32.089726 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.1.14 tell 192.168.1.15, length 28
14:00:32.089740 ARP, Ethernet (len 6), IPv4 (len 4), Reply 192.168.1.14 is-at 0e:bf:04:b7:ed:52 (oui Unknown), length 28
14:00:32.090141 ARP, Ethernet (len 6), IPv4 (len 4), Reply 192.168.1.14 is-at 7a:a5:71:63:47:94 (oui Unknown), length 28
14:00:32.090160 ARP, Ethernet (len 6), IPv4 (len 4), Reply 192.168.1.14 is-at 02:f9:33:d5:04:0d (oui Unknown), length 28
14:00:32.090168 ARP, Ethernet (len 6), IPv4 (len 4), Reply 192.168.1.14 is-at 9a:a0:46:e4:03:06 (oui Unknown), length 28
Four different devices are claiming to own the non-existing IP address!
Looking them up in neutron shows they are all related to existing ports
on the subnet, but different ones:
# neutron port-list | grep -e 47fbb8b5-55 -e 46647cca-32 -e e9e2d7c3-7e -e 9ac24ac1-e1
| 46647cca-3293-42ea-8ec2-0834e19422fa | | fa:16:3e:7d:9c:45 | {"subnet_id": "25dbbdc0-f438-4f89-8663-1772f9c7ef36", "ip_address": "192.168.1.8"} |
| 47fbb8b5-5549-46e4-850e-bd382375e0f8 | | fa:16:3e:fa:df:32 | {"subnet_id": "25dbbdc0-f438-4f89-8663-1772f9c7ef36", "ip_address": "192.168.1.7"} |
| 9ac24ac1-e157-484e-b6a2-a1dded4731ac | | fa:16:3e:2a:80:6b | {"subnet_id": "25dbbdc0-f438-4f89-8663-1772f9c7ef36", "ip_address": "192.168.1.15"} |
| e9e2d7c3-7e58-4bc2-a25f-d48e658b2d56 | | fa:16:3e:0d:a6:0b | {"subnet_id": "25dbbdc0-f438-4f89-8663-1772f9c7ef36", "ip_address": "192.168.1.19"} |
Environment:
Host: Ubuntu server 14.04
Kernel: linux-image-generic-lts-vivid, 3.19.0-39-generic #44~14.04.1-Ubuntu SMP Wed Dec 2 10:00:35 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
OpenStack Kilo:
# dpkg -l | grep -e nova -e neutron
ii neutron-common 1:2015.1.2-0ubuntu2~cloud0 all Neutron is a virtual network service for Openstack - common
ii neutron-plugin-ml2 1:2015.1.2-0ubuntu2~cloud0 all Neutron is a virtual network service for Openstack - ML2 plugin
ii neutron-plugin-openvswitch-agent 1:2015.1.2-0ubuntu2~cloud0 all Neutron is a virtual network service for Openstack - Open vSwitch plugin agent
ii nova-common 1:2015.1.2-0ubuntu2~cloud0 all OpenStack Compute - common files
ii nova-compute 1:2015.1.2-0ubuntu2~cloud0 all OpenStack Compute - compute node base
ii nova-compute-kvm 1:2015.1.2-0ubuntu2~cloud0 all OpenStack Compute - compute node (KVM)
ii nova-compute-libvirt 1:2015.1.2-0ubuntu2~cloud0 all OpenStack Compute - compute node libvirt support
ii python-neutron 1:2015.1.2-0ubuntu2~cloud0 all Neutron is a virtual network service for Openstack - Python library
ii python-neutron-fwaas 2015.1.2-0ubuntu2~cloud0 all Firewall-as-a-Service driver for OpenStack Neutron
ii python-neutronclient 1:2.3.11-0ubuntu1.2~cloud0 all client - Neutron is a virtual network service for Openstack
ii python-nova 1:2015.1.2-0ubuntu2~cloud0 all OpenStack Compute Python libraries
ii python-novaclient 1:2.22.0-0ubuntu2~cloud0 all client library for OpenStack Compute API
** Affects: nova
Importance: Undecided
Status: New
** Tags: arp nova ovs subnet tenant
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1526818
Title:
Incorrect and excess ARP responses in tenant subnets
Status in OpenStack Compute (nova):
New
Bug description:
We are facing a very strange behaviour of ARP in tenant networks,
causing Windows guests to incorrectly decline DHCP addresses. These
VMs apparently do an ARP request for the address they have been
offered, discarding them in case a different MAC is reporting to own
that IP already.
We are using openvswitch-agent with ml2 plugin.
Investigating this issue using Linux guests. Please look at the
following example. A VM with the fixed-ip 192.168.1.15 reports the
following ARP cache:
root@michael-test2:~# arp
Address HWtype HWaddress Flags Mask Iface
host-192-168-1-2.openst ether fa:16:3e:de:ab:ea C eth0
192.168.1.13 ether a6:b2:dc:d8:39:c1 C eth0
192.168.1.119 (incomplete) eth0
host-192-168-1-20.opens ether fa:16:3e:76:43:ce C eth0
host-192-168-1-19.opens ether fa:16:3e:0d:a6:0b C eth0
host-192-168-1-1.openst ether fa:16:3e:2a:81:ff C eth0
192.168.1.14 ether 0e:bf:04:b7:ed:52 C eth0
Both 192.168.1.13 and 192.168.1.14 do not exist in this subnet, and their MAC addresses a6:b2:dc:d8:39:c1 and 0e:bf:04:b7:ed:52 actually belong to other instance qbr* and qvb* devices, living on their respective hypervisor hosts!
Looking at 0e:bf:04:b7:ed:52, for example, yields
# ip link list | grep -C1 -e 0e:bf:04:b7:ed:52
59: qbr9ac24ac1-e1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
link/ether 0e:bf:04:b7:ed:52 brd ff:ff:ff:ff:ff:ff
60: qvo9ac24ac1-e1: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UP mode DEFAULT group default qlen 1000
--
61: qvb9ac24ac1-e1: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master qbr9ac24ac1-e1 state UP mode DEFAULT group default qlen 1000
link/ether 0e:bf:04:b7:ed:52 brd ff:ff:ff:ff:ff:ff
62: tap9ac24ac1-e1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master qbr9ac24ac1-e1 state UNKNOWN mode DEFAULT group default qlen 500
on the compute node. Using tcpdump on qbr9ac24ac1-e1 on the host and
triggering a fresh ARM lookup from the guest results in
# tcpdump -i qbr9ac24ac1-e1 -vv -l | grep ARP
tcpdump: WARNING: qbr9ac24ac1-e1: no IPv4 address assigned
tcpdump: listening on qbr9ac24ac1-e1, link-type EN10MB (Ethernet), capture size 65535 bytes
14:00:32.089726 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.1.14 tell 192.168.1.15, length 28
14:00:32.089740 ARP, Ethernet (len 6), IPv4 (len 4), Reply 192.168.1.14 is-at 0e:bf:04:b7:ed:52 (oui Unknown), length 28
14:00:32.090141 ARP, Ethernet (len 6), IPv4 (len 4), Reply 192.168.1.14 is-at 7a:a5:71:63:47:94 (oui Unknown), length 28
14:00:32.090160 ARP, Ethernet (len 6), IPv4 (len 4), Reply 192.168.1.14 is-at 02:f9:33:d5:04:0d (oui Unknown), length 28
14:00:32.090168 ARP, Ethernet (len 6), IPv4 (len 4), Reply 192.168.1.14 is-at 9a:a0:46:e4:03:06 (oui Unknown), length 28
Four different devices are claiming to own the non-existing IP
address! Looking them up in neutron shows they are all related to
existing ports on the subnet, but different ones:
# neutron port-list | grep -e 47fbb8b5-55 -e 46647cca-32 -e e9e2d7c3-7e -e 9ac24ac1-e1
| 46647cca-3293-42ea-8ec2-0834e19422fa | | fa:16:3e:7d:9c:45 | {"subnet_id": "25dbbdc0-f438-4f89-8663-1772f9c7ef36", "ip_address": "192.168.1.8"} |
| 47fbb8b5-5549-46e4-850e-bd382375e0f8 | | fa:16:3e:fa:df:32 | {"subnet_id": "25dbbdc0-f438-4f89-8663-1772f9c7ef36", "ip_address": "192.168.1.7"} |
| 9ac24ac1-e157-484e-b6a2-a1dded4731ac | | fa:16:3e:2a:80:6b | {"subnet_id": "25dbbdc0-f438-4f89-8663-1772f9c7ef36", "ip_address": "192.168.1.15"} |
| e9e2d7c3-7e58-4bc2-a25f-d48e658b2d56 | | fa:16:3e:0d:a6:0b | {"subnet_id": "25dbbdc0-f438-4f89-8663-1772f9c7ef36", "ip_address": "192.168.1.19"} |
Environment:
Host: Ubuntu server 14.04
Kernel: linux-image-generic-lts-vivid, 3.19.0-39-generic #44~14.04.1-Ubuntu SMP Wed Dec 2 10:00:35 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
OpenStack Kilo:
# dpkg -l | grep -e nova -e neutron
ii neutron-common 1:2015.1.2-0ubuntu2~cloud0 all Neutron is a virtual network service for Openstack - common
ii neutron-plugin-ml2 1:2015.1.2-0ubuntu2~cloud0 all Neutron is a virtual network service for Openstack - ML2 plugin
ii neutron-plugin-openvswitch-agent 1:2015.1.2-0ubuntu2~cloud0 all Neutron is a virtual network service for Openstack - Open vSwitch plugin agent
ii nova-common 1:2015.1.2-0ubuntu2~cloud0 all OpenStack Compute - common files
ii nova-compute 1:2015.1.2-0ubuntu2~cloud0 all OpenStack Compute - compute node base
ii nova-compute-kvm 1:2015.1.2-0ubuntu2~cloud0 all OpenStack Compute - compute node (KVM)
ii nova-compute-libvirt 1:2015.1.2-0ubuntu2~cloud0 all OpenStack Compute - compute node libvirt support
ii python-neutron 1:2015.1.2-0ubuntu2~cloud0 all Neutron is a virtual network service for Openstack - Python library
ii python-neutron-fwaas 2015.1.2-0ubuntu2~cloud0 all Firewall-as-a-Service driver for OpenStack Neutron
ii python-neutronclient 1:2.3.11-0ubuntu1.2~cloud0 all client - Neutron is a virtual network service for Openstack
ii python-nova 1:2015.1.2-0ubuntu2~cloud0 all OpenStack Compute Python libraries
ii python-novaclient 1:2.22.0-0ubuntu2~cloud0 all client library for OpenStack Compute API
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1526818/+subscriptions
Follow ups