yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #88888
[Bug 1944619] Re: Instances with hardware offloaded ovs ports lose access after failed live migrations
** Also affects: nova/yoga
Importance: Undecided
Status: New
** Also affects: nova/victoria
Importance: Undecided
Status: New
** Also affects: nova/xena
Importance: Undecided
Status: New
** Also affects: nova/ussuri
Importance: Undecided
Status: New
** Also affects: nova/wallaby
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1944619
Title:
Instances with hardware offloaded ovs ports lose access after failed
live migrations
Status in neutron:
Incomplete
Status in OpenStack Compute (nova):
Fix Released
Status in OpenStack Compute (nova) ussuri series:
New
Status in OpenStack Compute (nova) victoria series:
New
Status in OpenStack Compute (nova) wallaby series:
New
Status in OpenStack Compute (nova) xena series:
New
Status in OpenStack Compute (nova) yoga series:
New
Bug description:
If for some reason a live migration fails for an instance with an
SRIOV port during the '_pre_live_migration' hook. The instance will
lose access to the network and leave behind duplicated port bindings
on the database.
The instance re-gains connectivity on the source host after a reboot
(don't know if there's another way to restore connectivity). As a side
effect of this behavior, the pre-live migration cleanup hook also
fails with:
PCI device 0000:3b:10.0 is in use by driver QEMU
[How to reproduce]
- Create an environment with SRIOV, (our case uses switchdev[1])
- Create 1 VM
- Provoke a failure in the _pre_live_migration process (for example creating a directory /var/lib/nova/instances/<instance id>)
- Check the VM's connectivity
- Check the logs for: libvirt.libvirtError: Requested operation is not valid: PCI device 0000:03:04.1 is in use by driver QEMU, domain instance-00000001
Full-stack trace[2]
[Expected]
VM connectivity is restored even if it gets a brief disconnection
As happens for non-SRIOV scenarios, after a failure, no leftovers remains (port bindings and instance path files)
[Observed]
VM loses connectivity which is only is restored after the VM status is set to ERROR and the VM is power recycled
Port bindings are not removed
[Environment]
Focal Ussuri with Mellanox Connect5 cards
[1] https://paste.ubuntu.com/p/PzBM7y6Dbr/
[2] https://paste.ubuntu.com/p/ThQmDYtdSS/
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1944619/+subscriptions
References