yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #87225
[Bug 1944619] [NEW] Instances with SRIOV ports loose access after failed live migrations
Public bug reported:
If for some reason a live migration fails for an instance with an SRIOV
port during the '_pre_live_migration' hook. The instance will lose
access to the network and leave behind duplicated port bindings on the
database.
The instance re-gains connectivity on the source host after a reboot
(don't know if there's another way to restore connectivity). As a side
effect of this behavior, the pre-live migration cleanup hook also fails
with:
PCI device 0000:3b:10.0 is in use by driver QEMU
[How to reproduce]
- Create an environment with SRIOV, (our case uses switchdev[1])
- Create 1 VM
- Provoke a failure in the _pre_live_migration process (for example creating a directory /var/lib/nova/instances/<instance id>)
- Check the VM's connectivity
- Check the logs for: libvirt.libvirtError: Requested operation is not valid: PCI device 0000:03:04.1 is in use by driver QEMU, domain instance-00000001
Full-stack trace[2]
[Expected]
VM connectivity is restored even if it gets a brief disconnection
[Observed]
VM loses connectivity which is only is restored after the VM status is set to ERROR and the VM is power recycled
[1] https://paste.ubuntu.com/p/PzBM7y6Dbr/
[2] https://paste.ubuntu.com/p/ThQmDYtdSS/
** Affects: neutron
Importance: Undecided
Status: New
** Description changed:
- If for some reason a live migration fails for an instance with an SRIOV port
- during the '_pre_live_migration' hook. The instance will lose access to the
- network and leave behind duplicated port bindings on the database.
+ If for some reason a live migration fails for an instance with an SRIOV
+ port during the '_pre_live_migration' hook. The instance will lose
+ access to the network and leave behind duplicated port bindings on the
+ database.
- The instance re-gains connectivity on the source host after a reboot (don't
- know if there's another way to restore connectivity). As a side effect of this
- behavior, the pre-live migration cleanup hook also fails with:
+ The instance re-gains connectivity on the source host after a reboot
+ (don't know if there's another way to restore connectivity). As a side
+ effect of this behavior, the pre-live migration cleanup hook also fails
+ with:
PCI device 0000:3b:10.0 is in use by driver QEMU
[How to reproduce]
- Create an environment with SRIOV, (our case uses switchdev[1])
- Create 1 VM
- Provoke a failure in the _pre_live_migration process (for example creating a directory /var/lib/nova/instances/<instance id>)
- Check the VM's connectivity
- Check the logs for: libvirt.libvirtError: Requested operation is not valid: PCI device 0000:03:04.1 is in use by driver QEMU, domain instance-00000001
+ - Create an environment with SRIOV, (our case uses switchdev[1])
+ - Create 1 VM
+ - Provoke a failure in the _pre_live_migration process (for example creating a directory /var/lib/nova/instances/<instance id>)
+ - Check the VM's connectivity
+ - Check the logs for: libvirt.libvirtError: Requested operation is not valid: PCI device 0000:03:04.1 is in use by driver QEMU, domain instance-00000001
Full-stack trace[2]
[Expected]
VM connectivity is restored even if it gets a brief disconnection
[Observed]
VM loses connectivity which is only is restored after the VM status is set to ERROR and the VM is power recycled
-
-
[1] https://paste.ubuntu.com/p/PzBM7y6Dbr/
[2] https://paste.ubuntu.com/p/ThQmDYtdSS/
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1944619
Title:
Instances with SRIOV ports loose access after failed live migrations
Status in neutron:
New
Bug description:
If for some reason a live migration fails for an instance with an
SRIOV port during the '_pre_live_migration' hook. The instance will
lose access to the network and leave behind duplicated port bindings
on the database.
The instance re-gains connectivity on the source host after a reboot
(don't know if there's another way to restore connectivity). As a side
effect of this behavior, the pre-live migration cleanup hook also
fails with:
PCI device 0000:3b:10.0 is in use by driver QEMU
[How to reproduce]
- Create an environment with SRIOV, (our case uses switchdev[1])
- Create 1 VM
- Provoke a failure in the _pre_live_migration process (for example creating a directory /var/lib/nova/instances/<instance id>)
- Check the VM's connectivity
- Check the logs for: libvirt.libvirtError: Requested operation is not valid: PCI device 0000:03:04.1 is in use by driver QEMU, domain instance-00000001
Full-stack trace[2]
[Expected]
VM connectivity is restored even if it gets a brief disconnection
[Observed]
VM loses connectivity which is only is restored after the VM status is set to ERROR and the VM is power recycled
[1] https://paste.ubuntu.com/p/PzBM7y6Dbr/
[2] https://paste.ubuntu.com/p/ThQmDYtdSS/
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1944619/+subscriptions
Follow ups