← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1944619] [NEW] Instances with SRIOV ports loose access after failed live migrations

 

Public bug reported:

If for some reason a live migration fails for an instance with an SRIOV
port during the '_pre_live_migration' hook. The instance will lose
access to the network and leave behind duplicated port bindings on the
database.

The instance re-gains connectivity on the source host after a reboot
(don't know if there's another way to restore connectivity). As a side
effect of this behavior, the pre-live migration cleanup hook also fails
with:

PCI device 0000:3b:10.0 is in use by driver QEMU

[How to reproduce]

- Create an environment with SRIOV, (our case uses switchdev[1])
- Create 1 VM
- Provoke a failure in the _pre_live_migration process (for example creating a directory /var/lib/nova/instances/<instance id>)
- Check the VM's connectivity
- Check the logs for: libvirt.libvirtError: Requested operation is not valid: PCI device 0000:03:04.1 is in use by driver QEMU, domain instance-00000001
Full-stack trace[2]

[Expected]

VM connectivity is restored even if it gets a brief disconnection

[Observed]
VM loses connectivity which is only is restored after the VM status is set to ERROR and the VM is power recycled

[1] https://paste.ubuntu.com/p/PzBM7y6Dbr/
[2] https://paste.ubuntu.com/p/ThQmDYtdSS/

** Affects: neutron
     Importance: Undecided
         Status: New

** Description changed:

- If for some reason a live migration fails for an instance with an SRIOV port
- during the '_pre_live_migration' hook. The instance will lose access to the
- network and leave behind duplicated port bindings on the database.
+ If for some reason a live migration fails for an instance with an SRIOV
+ port during the '_pre_live_migration' hook. The instance will lose
+ access to the network and leave behind duplicated port bindings on the
+ database.
  
- The instance re-gains connectivity on the source host after a reboot (don't
- know if there's another way to restore connectivity). As a side effect of this
- behavior, the pre-live migration cleanup hook also fails with: 
+ The instance re-gains connectivity on the source host after a reboot
+ (don't know if there's another way to restore connectivity). As a side
+ effect of this behavior, the pre-live migration cleanup hook also fails
+ with:
  
  PCI device 0000:3b:10.0 is in use by driver QEMU
  
  [How to reproduce]
  
- Create an environment with SRIOV, (our case uses switchdev[1])
- Create 1 VM
- Provoke a failure in the _pre_live_migration process (for example creating a directory /var/lib/nova/instances/<instance id>)
- Check the VM's connectivity
- Check the logs for: libvirt.libvirtError: Requested operation is not valid: PCI device 0000:03:04.1 is in use by driver QEMU, domain instance-00000001
+ - Create an environment with SRIOV, (our case uses switchdev[1])
+ - Create 1 VM
+ - Provoke a failure in the _pre_live_migration process (for example creating a directory /var/lib/nova/instances/<instance id>)
+ - Check the VM's connectivity
+ - Check the logs for: libvirt.libvirtError: Requested operation is not valid: PCI device 0000:03:04.1 is in use by driver QEMU, domain instance-00000001
  Full-stack trace[2]
  
  [Expected]
  
  VM connectivity is restored even if it gets a brief disconnection
  
  [Observed]
  VM loses connectivity which is only is restored after the VM status is set to ERROR and the VM is power recycled
  
- 
- 
  [1] https://paste.ubuntu.com/p/PzBM7y6Dbr/
  [2] https://paste.ubuntu.com/p/ThQmDYtdSS/

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1944619

Title:
  Instances with SRIOV ports loose access after failed live migrations

Status in neutron:
  New

Bug description:
  If for some reason a live migration fails for an instance with an
  SRIOV port during the '_pre_live_migration' hook. The instance will
  lose access to the network and leave behind duplicated port bindings
  on the database.

  The instance re-gains connectivity on the source host after a reboot
  (don't know if there's another way to restore connectivity). As a side
  effect of this behavior, the pre-live migration cleanup hook also
  fails with:

  PCI device 0000:3b:10.0 is in use by driver QEMU

  [How to reproduce]

  - Create an environment with SRIOV, (our case uses switchdev[1])
  - Create 1 VM
  - Provoke a failure in the _pre_live_migration process (for example creating a directory /var/lib/nova/instances/<instance id>)
  - Check the VM's connectivity
  - Check the logs for: libvirt.libvirtError: Requested operation is not valid: PCI device 0000:03:04.1 is in use by driver QEMU, domain instance-00000001
  Full-stack trace[2]

  [Expected]

  VM connectivity is restored even if it gets a brief disconnection

  [Observed]
  VM loses connectivity which is only is restored after the VM status is set to ERROR and the VM is power recycled

  [1] https://paste.ubuntu.com/p/PzBM7y6Dbr/
  [2] https://paste.ubuntu.com/p/ThQmDYtdSS/

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1944619/+subscriptions



Follow ups