← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1944619] Re: Instances with hardware offloaded ovs ports lose access after failed live migrations

 

** Also affects: nova/yoga
   Importance: Undecided
       Status: New

** Also affects: nova/victoria
   Importance: Undecided
       Status: New

** Also affects: nova/xena
   Importance: Undecided
       Status: New

** Also affects: nova/ussuri
   Importance: Undecided
       Status: New

** Also affects: nova/wallaby
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1944619

Title:
  Instances with hardware offloaded ovs ports lose access after failed
  live migrations

Status in neutron:
  Incomplete
Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  New
Status in OpenStack Compute (nova) victoria series:
  New
Status in OpenStack Compute (nova) wallaby series:
  New
Status in OpenStack Compute (nova) xena series:
  New
Status in OpenStack Compute (nova) yoga series:
  New

Bug description:
  If for some reason a live migration fails for an instance with an
  SRIOV port during the '_pre_live_migration' hook. The instance will
  lose access to the network and leave behind duplicated port bindings
  on the database.

  The instance re-gains connectivity on the source host after a reboot
  (don't know if there's another way to restore connectivity). As a side
  effect of this behavior, the pre-live migration cleanup hook also
  fails with:

  PCI device 0000:3b:10.0 is in use by driver QEMU

  [How to reproduce]

  - Create an environment with SRIOV, (our case uses switchdev[1])
  - Create 1 VM
  - Provoke a failure in the _pre_live_migration process (for example creating a directory /var/lib/nova/instances/<instance id>)
  - Check the VM's connectivity
  - Check the logs for: libvirt.libvirtError: Requested operation is not valid: PCI device 0000:03:04.1 is in use by driver QEMU, domain instance-00000001
  Full-stack trace[2]

  [Expected]

  VM connectivity is restored even if it gets a brief disconnection
  As happens for non-SRIOV scenarios, after a failure, no leftovers remains (port bindings and instance path files)

  [Observed]
  VM loses connectivity which is only is restored after the VM status is set to ERROR and the VM is power recycled
  Port bindings are not removed

  [Environment]
  Focal Ussuri with Mellanox Connect5 cards

  [1] https://paste.ubuntu.com/p/PzBM7y6Dbr/
  [2] https://paste.ubuntu.com/p/ThQmDYtdSS/

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1944619/+subscriptions



References