yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #88648
[Bug 1967144] Re: [OVN] Live migration can fail due to wrong revision id during setting requested chassis in ovn
Reviewed: https://review.opendev.org/c/openstack/neutron/+/836618
Committed: https://opendev.org/openstack/neutron/commit/4f75c6a616d3cb549153fcc496926358dfc9178a
Submitter: "Zuul (22348)"
Branch: master
commit 4f75c6a616d3cb549153fcc496926358dfc9178a
Author: Slawek Kaplonski <skaplons@xxxxxxxxxx>
Date: Tue Apr 5 11:26:32 2022 +0200
Retry port_update in the OVN if revision mismatch during live-migration
This is terrible hack but it seems that there is no other way to
fix/workaround the race which may happen during live-migration between:
- port update event comming from the OVN db (port DOWN on the src node),
- API call from nova-compute to activate port binding on the destination
node.
If those 2 events will be executed in specific order by different
workers it may happen that port binding activation will not update
"requested_chassis" of the port in OVN northd due to revision mismatch
(ovn_revision and neutron_revision will be already bumped by the worker
which processes "port update" OVN event).
If "requested_chassis" will not be updated, OVN will not claim port on
the dest node thus connectivity to the vm will be broken.
To workaround that issue, port_update_postcommit method from the OVN
mechanism driver will catch RevisionMismatch exception raised by the
ovn_client and in case that this was port_update after live_migration,
will get port data from neutron db and try to update port in the OVN
northd once again.
Closes-bug: #1967144
Change-Id: If6e1c6e0fc772101bcd3427601800aaae84381dd
** Changed in: neutron
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1967144
Title:
[OVN] Live migration can fail due to wrong revision id during setting
requested chassis in ovn
Status in neutron:
Fix Released
Bug description:
During the live-migration of vm, when Nova calls /binding/activate API to activate port binding on the destination node, Neutron calls mechanism drivers' port_update_postcommit() method and in the ovn mechanism driver at that point there should be updated "requested chassis" field for the LSP.
Unfortunately we saw recently in our d/s ci race condition when one worker was processing such port binding activate request and other worker was processing OVN event related to the same port.
Finally there was mismatch of the revision numbers in ovn db and neutron and requested chassis wasn't updated for the LSP. Due to that port wasn't claimed by OVN on the destination node thus connectivity to the vm was broken.
Some more details can be found in our d/s bugzilla
https://bugzilla.redhat.com/show_bug.cgi?id=2068065
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1967144/+subscriptions
References