← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1988199] Re: [OVN][live-migration] Nova port binding request and "LogicalSwitchPortUpdateUpEvent" race condition

 

Reviewed:  https://review.opendev.org/c/openstack/neutron/+/855257
Committed: https://opendev.org/openstack/neutron/commit/91f0864dc0ccf0f67be7162f011706dbc6383cb3
Submitter: "Zuul (22348)"
Branch:    master

commit 91f0864dc0ccf0f67be7162f011706dbc6383cb3
Author: Rodolfo Alonso Hernandez <ralonsoh@xxxxxxxxxx>
Date:   Tue Aug 30 18:09:34 2022 +0200

    Add an active wait during the port provisioning event
    
    In ML2/OVN, during a live-migration process, it could
    happend that the port provisioning event is received before
    the port binding has been updated. That means the port has
    been created in the destination host and the event received
    (this event will remove any pending provisioning block). But
    the Nova port binding request has not arrived yet, updating
    the port binding registers. Because the port is considered
    "not bound" (yet), the port provisioning doesn't set the port
    status to ACTIVE.
    
    This patch creates an active wait during the port provisioning
    event method. If the port binding is still "unbound", the method
    retries the port retrieval several times, giving some time to the
    port binding request from Nova to arrive.
    
    Closes-Bug: #1988199
    Change-Id: I50091c84e67c172c94ce9140f23235421599185c


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1988199

Title:
  [OVN][live-migration] Nova port binding request and
  "LogicalSwitchPortUpdateUpEvent" race condition

Status in neutron:
  Fix Released

Bug description:
  Related Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2120409

  Summary: after the a live-migration, the VM port status is DOWN.

  During a live-migration, the following events happen in the Neutron server:
  1) We receive a port update. Because the "migrating_to" field is in the port binding, the OVN mech driver forces a port update from DOWN to UP. This (1) sets the port status to UP and (2) sends the vif-plugged event to Nova. That will trigger the port creation (layer 1) in the destination node.

  2) Then the "LogicalSwitchPortUpdateDownEvent", because the source
  port was deleted. That sets the port status to DOWN.

  3) At the same time we receive the "LogicalSwitchPortUpdateUpEvent",
  because the port in the destination host has been created. This last
  event won't manually set the port status to UP. Instead it will remove
  any port provisioning block [1].

  3.1) If the port provisioned is considered as complete
  ("provisioning_complete" event), this is processed in
  "Ml2Plugin._port_provisioned". The problem we are hitting here is that
  the port has no host (the port is still not bound):

  2022-08-26 10:08:23.373 17 DEBUG neutron.plugins.ml2.plugin
  [req-2b13d263-5748-46e2-9fdf-33df50634607 - - - - -] Port
  943db0db-773f-45e9-8b68-0ebcc1840207 cannot update to ACTIVE because
  it is not bound. _port_provisioned /usr/lib/python3.9/site-
  packages/neutron/plugins/ml2/plugin.py:339

  4) Right after the Nova port binding request is received and the port
  is bound: https://paste.opendev.org/show/bIUoJkiStCIe8TBb0573/

  This is basically the issue we have here: there is a race condition
  between (1) the Nova port binding request and (2) the
  "LogicalSwitchPortUpdateUpEvent" that is received when the OVS port is
  created on a chassis.

  Just for testing, if I add a 1 second sleep at the very first line of
  "_port_provisioned", allowing to receive the Nova port binding request
  (that will bind the port to a host), the port provisioning succeeds
  and the port is set to UP. I'll find a way to fix that in the
  Ml2Plugin code.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1988199/+subscriptions



References