yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #89677
[Bug 1988199] Re: [OVN][live-migration] Nova port binding request and "LogicalSwitchPortUpdateUpEvent" race condition
Reviewed: https://review.opendev.org/c/openstack/neutron/+/855257
Committed: https://opendev.org/openstack/neutron/commit/91f0864dc0ccf0f67be7162f011706dbc6383cb3
Submitter: "Zuul (22348)"
Branch: master
commit 91f0864dc0ccf0f67be7162f011706dbc6383cb3
Author: Rodolfo Alonso Hernandez <ralonsoh@xxxxxxxxxx>
Date: Tue Aug 30 18:09:34 2022 +0200
Add an active wait during the port provisioning event
In ML2/OVN, during a live-migration process, it could
happend that the port provisioning event is received before
the port binding has been updated. That means the port has
been created in the destination host and the event received
(this event will remove any pending provisioning block). But
the Nova port binding request has not arrived yet, updating
the port binding registers. Because the port is considered
"not bound" (yet), the port provisioning doesn't set the port
status to ACTIVE.
This patch creates an active wait during the port provisioning
event method. If the port binding is still "unbound", the method
retries the port retrieval several times, giving some time to the
port binding request from Nova to arrive.
Closes-Bug: #1988199
Change-Id: I50091c84e67c172c94ce9140f23235421599185c
** Changed in: neutron
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1988199
Title:
[OVN][live-migration] Nova port binding request and
"LogicalSwitchPortUpdateUpEvent" race condition
Status in neutron:
Fix Released
Bug description:
Related Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2120409
Summary: after the a live-migration, the VM port status is DOWN.
During a live-migration, the following events happen in the Neutron server:
1) We receive a port update. Because the "migrating_to" field is in the port binding, the OVN mech driver forces a port update from DOWN to UP. This (1) sets the port status to UP and (2) sends the vif-plugged event to Nova. That will trigger the port creation (layer 1) in the destination node.
2) Then the "LogicalSwitchPortUpdateDownEvent", because the source
port was deleted. That sets the port status to DOWN.
3) At the same time we receive the "LogicalSwitchPortUpdateUpEvent",
because the port in the destination host has been created. This last
event won't manually set the port status to UP. Instead it will remove
any port provisioning block [1].
3.1) If the port provisioned is considered as complete
("provisioning_complete" event), this is processed in
"Ml2Plugin._port_provisioned". The problem we are hitting here is that
the port has no host (the port is still not bound):
2022-08-26 10:08:23.373 17 DEBUG neutron.plugins.ml2.plugin
[req-2b13d263-5748-46e2-9fdf-33df50634607 - - - - -] Port
943db0db-773f-45e9-8b68-0ebcc1840207 cannot update to ACTIVE because
it is not bound. _port_provisioned /usr/lib/python3.9/site-
packages/neutron/plugins/ml2/plugin.py:339
4) Right after the Nova port binding request is received and the port
is bound: https://paste.opendev.org/show/bIUoJkiStCIe8TBb0573/
This is basically the issue we have here: there is a race condition
between (1) the Nova port binding request and (2) the
"LogicalSwitchPortUpdateUpEvent" that is received when the OVS port is
created on a chassis.
Just for testing, if I add a 1 second sleep at the very first line of
"_port_provisioned", allowing to receive the Nova port binding request
(that will bind the port to a host), the port provisioning succeeds
and the port is set to UP. I'll find a way to fix that in the
Ml2Plugin code.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1988199/+subscriptions
References