← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1452886] [NEW] Port stuck in BUILD state results in limited instance connectivity

 

Public bug reported:

I am currently experiencing (random) cases of instances that are spun up
having limited connectivity. There are about 650 instances in the
environment and 45 networks.

Network Info:
- ML2/LinuxBridge/l2pop
- VXLAN networks

Symptoms:
- On the local compute node, the instance tap is in the bridge. Everything looks good.
- Instance is reachable from some, but not all, instances/devices in the same subnet across all compute and network nodes
- On some compute nodes and network nodes, the ARP and FDB entries for the instance do not exist. Instances/devices on these nodes cannot communicate with the new instance.
- No errors are logged

Here are some observations for the non-working instances:
- The corresponding Neutron port is stuck in a BUILD state
- The binding:host_id value of the port (ie. compute-xxx) does not match the OS-EXT-SRV-ATTR:host value of the instance (ie. compute-zzz). For working instances, these values match.

I am unable to replicate this consistently at this time, nor am I sure
where to begin pinpointing the issue. Any help is appreciated.

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1452886

Title:
  Port stuck in BUILD state results in limited instance connectivity

Status in OpenStack Neutron (virtual network service):
  New

Bug description:
  I am currently experiencing (random) cases of instances that are spun
  up having limited connectivity. There are about 650 instances in the
  environment and 45 networks.

  Network Info:
  - ML2/LinuxBridge/l2pop
  - VXLAN networks

  Symptoms:
  - On the local compute node, the instance tap is in the bridge. Everything looks good.
  - Instance is reachable from some, but not all, instances/devices in the same subnet across all compute and network nodes
  - On some compute nodes and network nodes, the ARP and FDB entries for the instance do not exist. Instances/devices on these nodes cannot communicate with the new instance.
  - No errors are logged

  Here are some observations for the non-working instances:
  - The corresponding Neutron port is stuck in a BUILD state
  - The binding:host_id value of the port (ie. compute-xxx) does not match the OS-EXT-SRV-ATTR:host value of the instance (ie. compute-zzz). For working instances, these values match.

  I am unable to replicate this consistently at this time, nor am I sure
  where to begin pinpointing the issue. Any help is appreciated.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1452886/+subscriptions


Follow ups

References