← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1921884] [NEW] OVS gets stuck: "Port binding failed" trying to (re-)start VMs

 

Public bug reported:

Running the Bionic/Ussuri OpenStack cloud with Neutron/ML-2 OVS virtual
networking and hardware offload enabled on the NIC level [1]. The cloud
has 6 nodes with 2 host aggregates, 3 nodes each. The cloud has 2 vlan
physnets, each spanning the corresponding host aggregate; and 1 flat
physnet spanning all 6 nodes.

I'm deploying charmed Kubernetes to one of the host aggregate using VLAN
provider network.

Despite it was working initially, after some time it is not possible
schedule a VM to this host aggregate with "Port binding failed" error
due to a timeout. The existing VMs typically continue to work and be
reachable over the network as expected. The other symptom is that OVS
commands like 'ovs-appctl' hang. The workaround to this problem is to
restart OVS, but this helps only temporarily.

Tried with both port security turned on and off on the network level.

[1] https://docs.openstack.org/project-deploy-guide/charm-deployment-
guide/latest/app-hardware-offload.html

** Affects: neutron (Ubuntu)
     Importance: Undecided
         Status: New

** Also affects: neutron
   Importance: Undecided
       Status: New

** No longer affects: neutron

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1921884

Title:
  OVS gets stuck: "Port binding failed" trying to (re-)start VMs

Status in neutron package in Ubuntu:
  New

Bug description:
  Running the Bionic/Ussuri OpenStack cloud with Neutron/ML-2 OVS
  virtual networking and hardware offload enabled on the NIC level [1].
  The cloud has 6 nodes with 2 host aggregates, 3 nodes each. The cloud
  has 2 vlan physnets, each spanning the corresponding host aggregate;
  and 1 flat physnet spanning all 6 nodes.

  I'm deploying charmed Kubernetes to one of the host aggregate using
  VLAN provider network.

  Despite it was working initially, after some time it is not possible
  schedule a VM to this host aggregate with "Port binding failed" error
  due to a timeout. The existing VMs typically continue to work and be
  reachable over the network as expected. The other symptom is that OVS
  commands like 'ovs-appctl' hang. The workaround to this problem is to
  restart OVS, but this helps only temporarily.

  Tried with both port security turned on and off on the network level.

  [1] https://docs.openstack.org/project-deploy-guide/charm-deployment-
  guide/latest/app-hardware-offload.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/neutron/+bug/1921884/+subscriptions