← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1526818] [NEW] Incorrect and excess ARP responses in tenant subnets

 

Public bug reported:

We are facing a very strange behaviour of ARP in tenant networks,
causing Windows guests to incorrectly decline DHCP addresses. These VMs
apparently do an ARP request for the address they have been offered,
discarding them in case a different MAC is reporting to own that IP
already.

We are using openvswitch-agent with ml2 plugin.

Investigating this issue using Linux guests. Please look at the
following example. A VM with the fixed-ip 192.168.1.15 reports the
following ARP cache:

   root@michael-test2:~# arp
   Address                  HWtype  HWaddress           Flags Mask            Iface
   host-192-168-1-2.openst  ether   fa:16:3e:de:ab:ea   C                     eth0
   192.168.1.13             ether   a6:b2:dc:d8:39:c1   C                     eth0
   192.168.1.119                    (incomplete)                              eth0
   host-192-168-1-20.opens  ether   fa:16:3e:76:43:ce   C                     eth0
   host-192-168-1-19.opens  ether   fa:16:3e:0d:a6:0b   C                     eth0
   host-192-168-1-1.openst  ether   fa:16:3e:2a:81:ff   C                     eth0
   192.168.1.14             ether   0e:bf:04:b7:ed:52   C                     eth0
   
Both 192.168.1.13 and 192.168.1.14 do not exist in this subnet, and their MAC addresses a6:b2:dc:d8:39:c1 and 0e:bf:04:b7:ed:52 actually belong to other instance qbr* and qvb* devices, living on their respective hypervisor hosts!

Looking at 0e:bf:04:b7:ed:52, for example, yields

   # ip link list | grep -C1 -e 0e:bf:04:b7:ed:52
   59: qbr9ac24ac1-e1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default 
       link/ether 0e:bf:04:b7:ed:52 brd ff:ff:ff:ff:ff:ff
   60: qvo9ac24ac1-e1: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UP mode DEFAULT group default qlen 1000
   --
   61: qvb9ac24ac1-e1: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master qbr9ac24ac1-e1 state UP mode DEFAULT group default qlen 1000
       link/ether 0e:bf:04:b7:ed:52 brd ff:ff:ff:ff:ff:ff
   62: tap9ac24ac1-e1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master qbr9ac24ac1-e1 state UNKNOWN mode DEFAULT group default qlen 500

on the compute node. Using tcpdump on qbr9ac24ac1-e1 on the host and
triggering a fresh ARM lookup from the guest results in

   # tcpdump -i qbr9ac24ac1-e1 -vv -l | grep ARP
   tcpdump: WARNING: qbr9ac24ac1-e1: no IPv4 address assigned
   tcpdump: listening on qbr9ac24ac1-e1, link-type EN10MB (Ethernet), capture size 65535 bytes
   14:00:32.089726 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.1.14 tell 192.168.1.15, length 28
   14:00:32.089740 ARP, Ethernet (len 6), IPv4 (len 4), Reply 192.168.1.14 is-at 0e:bf:04:b7:ed:52 (oui Unknown), length 28
   14:00:32.090141 ARP, Ethernet (len 6), IPv4 (len 4), Reply 192.168.1.14 is-at 7a:a5:71:63:47:94 (oui Unknown), length 28
   14:00:32.090160 ARP, Ethernet (len 6), IPv4 (len 4), Reply 192.168.1.14 is-at 02:f9:33:d5:04:0d (oui Unknown), length 28
   14:00:32.090168 ARP, Ethernet (len 6), IPv4 (len 4), Reply 192.168.1.14 is-at 9a:a0:46:e4:03:06 (oui Unknown), length 28

Four different devices are claiming to own the non-existing IP address!
Looking them up in neutron shows they are all related to existing ports
on the subnet, but different ones:

   # neutron port-list | grep -e 47fbb8b5-55 -e 46647cca-32 -e e9e2d7c3-7e -e 9ac24ac1-e1
   | 46647cca-3293-42ea-8ec2-0834e19422fa |                                           | fa:16:3e:7d:9c:45 | {"subnet_id": "25dbbdc0-f438-4f89-8663-1772f9c7ef36", "ip_address": "192.168.1.8"}   |
   | 47fbb8b5-5549-46e4-850e-bd382375e0f8 |                                           | fa:16:3e:fa:df:32 | {"subnet_id": "25dbbdc0-f438-4f89-8663-1772f9c7ef36", "ip_address": "192.168.1.7"}   |
   | 9ac24ac1-e157-484e-b6a2-a1dded4731ac |                                           | fa:16:3e:2a:80:6b | {"subnet_id": "25dbbdc0-f438-4f89-8663-1772f9c7ef36", "ip_address": "192.168.1.15"}  |
   | e9e2d7c3-7e58-4bc2-a25f-d48e658b2d56 |                                           | fa:16:3e:0d:a6:0b | {"subnet_id": "25dbbdc0-f438-4f89-8663-1772f9c7ef36", "ip_address": "192.168.1.19"}  |

Environment:

Host: Ubuntu server 14.04
Kernel: linux-image-generic-lts-vivid, 3.19.0-39-generic #44~14.04.1-Ubuntu SMP Wed Dec 2 10:00:35 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
OpenStack Kilo:
# dpkg -l | grep -e nova -e neutron
ii  neutron-common                      1:2015.1.2-0ubuntu2~cloud0            all          Neutron is a virtual network service for Openstack - common
ii  neutron-plugin-ml2                  1:2015.1.2-0ubuntu2~cloud0            all          Neutron is a virtual network service for Openstack - ML2 plugin
ii  neutron-plugin-openvswitch-agent    1:2015.1.2-0ubuntu2~cloud0            all          Neutron is a virtual network service for Openstack - Open vSwitch plugin agent
ii  nova-common                         1:2015.1.2-0ubuntu2~cloud0            all          OpenStack Compute - common files
ii  nova-compute                        1:2015.1.2-0ubuntu2~cloud0            all          OpenStack Compute - compute node base
ii  nova-compute-kvm                    1:2015.1.2-0ubuntu2~cloud0            all          OpenStack Compute - compute node (KVM)
ii  nova-compute-libvirt                1:2015.1.2-0ubuntu2~cloud0            all          OpenStack Compute - compute node libvirt support
ii  python-neutron                      1:2015.1.2-0ubuntu2~cloud0            all          Neutron is a virtual network service for Openstack - Python library
ii  python-neutron-fwaas                2015.1.2-0ubuntu2~cloud0              all          Firewall-as-a-Service driver for OpenStack Neutron
ii  python-neutronclient                1:2.3.11-0ubuntu1.2~cloud0            all          client - Neutron is a virtual network service for Openstack
ii  python-nova                         1:2015.1.2-0ubuntu2~cloud0            all          OpenStack Compute Python libraries
ii  python-novaclient                   1:2.22.0-0ubuntu2~cloud0              all          client library for OpenStack Compute API

** Affects: nova
     Importance: Undecided
         Status: New


** Tags: arp nova ovs subnet tenant

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1526818

Title:
  Incorrect and excess ARP responses in tenant subnets

Status in OpenStack Compute (nova):
  New

Bug description:
  We are facing a very strange behaviour of ARP in tenant networks,
  causing Windows guests to incorrectly decline DHCP addresses. These
  VMs apparently do an ARP request for the address they have been
  offered, discarding them in case a different MAC is reporting to own
  that IP already.

  We are using openvswitch-agent with ml2 plugin.

  Investigating this issue using Linux guests. Please look at the
  following example. A VM with the fixed-ip 192.168.1.15 reports the
  following ARP cache:

     root@michael-test2:~# arp
     Address                  HWtype  HWaddress           Flags Mask            Iface
     host-192-168-1-2.openst  ether   fa:16:3e:de:ab:ea   C                     eth0
     192.168.1.13             ether   a6:b2:dc:d8:39:c1   C                     eth0
     192.168.1.119                    (incomplete)                              eth0
     host-192-168-1-20.opens  ether   fa:16:3e:76:43:ce   C                     eth0
     host-192-168-1-19.opens  ether   fa:16:3e:0d:a6:0b   C                     eth0
     host-192-168-1-1.openst  ether   fa:16:3e:2a:81:ff   C                     eth0
     192.168.1.14             ether   0e:bf:04:b7:ed:52   C                     eth0
     
  Both 192.168.1.13 and 192.168.1.14 do not exist in this subnet, and their MAC addresses a6:b2:dc:d8:39:c1 and 0e:bf:04:b7:ed:52 actually belong to other instance qbr* and qvb* devices, living on their respective hypervisor hosts!

  Looking at 0e:bf:04:b7:ed:52, for example, yields

     # ip link list | grep -C1 -e 0e:bf:04:b7:ed:52
     59: qbr9ac24ac1-e1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default 
         link/ether 0e:bf:04:b7:ed:52 brd ff:ff:ff:ff:ff:ff
     60: qvo9ac24ac1-e1: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UP mode DEFAULT group default qlen 1000
     --
     61: qvb9ac24ac1-e1: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master qbr9ac24ac1-e1 state UP mode DEFAULT group default qlen 1000
         link/ether 0e:bf:04:b7:ed:52 brd ff:ff:ff:ff:ff:ff
     62: tap9ac24ac1-e1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master qbr9ac24ac1-e1 state UNKNOWN mode DEFAULT group default qlen 500

  on the compute node. Using tcpdump on qbr9ac24ac1-e1 on the host and
  triggering a fresh ARM lookup from the guest results in

     # tcpdump -i qbr9ac24ac1-e1 -vv -l | grep ARP
     tcpdump: WARNING: qbr9ac24ac1-e1: no IPv4 address assigned
     tcpdump: listening on qbr9ac24ac1-e1, link-type EN10MB (Ethernet), capture size 65535 bytes
     14:00:32.089726 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.1.14 tell 192.168.1.15, length 28
     14:00:32.089740 ARP, Ethernet (len 6), IPv4 (len 4), Reply 192.168.1.14 is-at 0e:bf:04:b7:ed:52 (oui Unknown), length 28
     14:00:32.090141 ARP, Ethernet (len 6), IPv4 (len 4), Reply 192.168.1.14 is-at 7a:a5:71:63:47:94 (oui Unknown), length 28
     14:00:32.090160 ARP, Ethernet (len 6), IPv4 (len 4), Reply 192.168.1.14 is-at 02:f9:33:d5:04:0d (oui Unknown), length 28
     14:00:32.090168 ARP, Ethernet (len 6), IPv4 (len 4), Reply 192.168.1.14 is-at 9a:a0:46:e4:03:06 (oui Unknown), length 28

  Four different devices are claiming to own the non-existing IP
  address! Looking them up in neutron shows they are all related to
  existing ports on the subnet, but different ones:

     # neutron port-list | grep -e 47fbb8b5-55 -e 46647cca-32 -e e9e2d7c3-7e -e 9ac24ac1-e1
     | 46647cca-3293-42ea-8ec2-0834e19422fa |                                           | fa:16:3e:7d:9c:45 | {"subnet_id": "25dbbdc0-f438-4f89-8663-1772f9c7ef36", "ip_address": "192.168.1.8"}   |
     | 47fbb8b5-5549-46e4-850e-bd382375e0f8 |                                           | fa:16:3e:fa:df:32 | {"subnet_id": "25dbbdc0-f438-4f89-8663-1772f9c7ef36", "ip_address": "192.168.1.7"}   |
     | 9ac24ac1-e157-484e-b6a2-a1dded4731ac |                                           | fa:16:3e:2a:80:6b | {"subnet_id": "25dbbdc0-f438-4f89-8663-1772f9c7ef36", "ip_address": "192.168.1.15"}  |
     | e9e2d7c3-7e58-4bc2-a25f-d48e658b2d56 |                                           | fa:16:3e:0d:a6:0b | {"subnet_id": "25dbbdc0-f438-4f89-8663-1772f9c7ef36", "ip_address": "192.168.1.19"}  |

  Environment:

  Host: Ubuntu server 14.04
  Kernel: linux-image-generic-lts-vivid, 3.19.0-39-generic #44~14.04.1-Ubuntu SMP Wed Dec 2 10:00:35 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
  OpenStack Kilo:
  # dpkg -l | grep -e nova -e neutron
  ii  neutron-common                      1:2015.1.2-0ubuntu2~cloud0            all          Neutron is a virtual network service for Openstack - common
  ii  neutron-plugin-ml2                  1:2015.1.2-0ubuntu2~cloud0            all          Neutron is a virtual network service for Openstack - ML2 plugin
  ii  neutron-plugin-openvswitch-agent    1:2015.1.2-0ubuntu2~cloud0            all          Neutron is a virtual network service for Openstack - Open vSwitch plugin agent
  ii  nova-common                         1:2015.1.2-0ubuntu2~cloud0            all          OpenStack Compute - common files
  ii  nova-compute                        1:2015.1.2-0ubuntu2~cloud0            all          OpenStack Compute - compute node base
  ii  nova-compute-kvm                    1:2015.1.2-0ubuntu2~cloud0            all          OpenStack Compute - compute node (KVM)
  ii  nova-compute-libvirt                1:2015.1.2-0ubuntu2~cloud0            all          OpenStack Compute - compute node libvirt support
  ii  python-neutron                      1:2015.1.2-0ubuntu2~cloud0            all          Neutron is a virtual network service for Openstack - Python library
  ii  python-neutron-fwaas                2015.1.2-0ubuntu2~cloud0              all          Firewall-as-a-Service driver for OpenStack Neutron
  ii  python-neutronclient                1:2.3.11-0ubuntu1.2~cloud0            all          client - Neutron is a virtual network service for Openstack
  ii  python-nova                         1:2015.1.2-0ubuntu2~cloud0            all          OpenStack Compute Python libraries
  ii  python-novaclient                   1:2.22.0-0ubuntu2~cloud0              all          client library for OpenStack Compute API

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1526818/+subscriptions


Follow ups