← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1967144] Re: [OVN] Live migration can fail due to wrong revision id during setting requested chassis in ovn

 

Reviewed:  https://review.opendev.org/c/openstack/neutron/+/836618
Committed: https://opendev.org/openstack/neutron/commit/4f75c6a616d3cb549153fcc496926358dfc9178a
Submitter: "Zuul (22348)"
Branch:    master

commit 4f75c6a616d3cb549153fcc496926358dfc9178a
Author: Slawek Kaplonski <skaplons@xxxxxxxxxx>
Date:   Tue Apr 5 11:26:32 2022 +0200

    Retry port_update in the OVN if revision mismatch during live-migration
    
    This is terrible hack but it seems that there is no other way to
    fix/workaround the race which may happen during live-migration between:
    - port update event comming from the OVN db (port DOWN on the src node),
    - API call from nova-compute to activate port binding on the destination
    node.
    
    If those 2 events will be executed in specific order by different
    workers it may happen that port binding activation will not update
    "requested_chassis" of the port in OVN northd due to revision mismatch
    (ovn_revision and neutron_revision will be already bumped by the worker
    which processes "port update" OVN event).
    If "requested_chassis" will not be updated, OVN will not claim port on
    the dest node thus connectivity to the vm will be broken.
    
    To workaround that issue, port_update_postcommit method from the OVN
    mechanism driver will catch RevisionMismatch exception raised by the
    ovn_client and in case that this was port_update after live_migration,
    will get port data from neutron db and try to update port in the OVN
    northd once again.
    
    Closes-bug: #1967144
    Change-Id: If6e1c6e0fc772101bcd3427601800aaae84381dd


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1967144

Title:
  [OVN] Live migration can fail due to wrong revision id during setting
  requested chassis in ovn

Status in neutron:
  Fix Released

Bug description:
  During the live-migration of vm, when Nova calls /binding/activate API to activate port binding on the destination node, Neutron calls mechanism drivers' port_update_postcommit() method and in the ovn mechanism driver at that point there should be updated "requested chassis" field for the LSP.
  Unfortunately we saw recently in our d/s ci race condition when one worker was processing such port binding activate request and other worker was processing OVN event related to the same port.
  Finally there was mismatch of the revision numbers in ovn db and neutron and requested chassis wasn't updated for the LSP. Due to that port wasn't claimed by OVN on the destination node thus connectivity to the vm was broken.

  Some more details can be found in our d/s bugzilla
  https://bugzilla.redhat.com/show_bug.cgi?id=2068065

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1967144/+subscriptions



References