← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2065577] Re: [ml2/ovs]Empty binding_levels=[] cause ovs-agent skipped to process port


Hi Bence,

Create vm with ports can reproduce this:
create 10 (or more) ports, and then create 5 (or more) VMs with --nic port-id, set 2 ports (NICs) for each VM. 

This can reproduce the issue in our env frequently.

I have not tested this on master.

I have no fix locally, but after some code resource, we found that [1] may be  related to this.
It changed the eventlet pool for ml2 ovo push RPC to the python native threads. And then [2] changed to use the python Queue for the ovo, but the native threads is still in use.
In some cases, the python thread scheduler may not run as excepted, the DB save action is a bit later than the ovo push RPC. And some times the ovo push RPC is not even run.

[1] https://review.opendev.org/c/openstack/neutron/+/555608/35/neutron/plugins/ml2/ovo_rpc.py
[2] https://review.opendev.org/c/openstack/neutron/+/788510/12/neutron/plugins/ml2/ovo_rpc.py

Still analyzing...

Report this bug to see if others meet same issue in their deployments.

** Changed in: neutron
       Status: Incomplete => Opinion

** Changed in: neutron
   Importance: Undecided => Medium

You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.

  [ml2/ovs]Empty binding_levels=[] cause ovs-agent skipped to process

Status in neutron:

Bug description:
  In our production environment we noticed some VM boot failures like this:
  1. Create port for nova   (port revision_number 0->1)
  2. Nova boot VM with --nic port-id
  3. Nova scheduled this VM to a host and plug the port
  4. Nova update the port device_owner  (port revision_number 1->2)
  5. Nova update the port host (port revision_number 2->3)
     (Yes, nova will call update_port twice!)
  6. Before call real _bind_port_if_needed, neutron-server push port Info cache with binding_levels=[], and revision_number=3
  7. Neutron-server try to bind this port
  8. Neutron-ovs-agent rpc_loop try to get the port details
  9. Neutron-ovs-agent Info cache RPC gets empty binding_levels=[] and skip processing port
  10. Neutron-server port bind is done and send Info cache,
     and now the port revision_number is still 3, while binding_levels=[<entry>] is not empty now.
  11. neutron-ovs-agent get the new info cache notification, but the revision_number is not changed, so the cache is not updated.

  The port will not be processed anymore.

To manage notifications about this bug go to: