← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1946298] [NEW] live-migration fails when option rom size is different in guest's memory

 

Public bug reported:

Description
===========
This problem is found when doing live-migration accross nova versions,
especially ipxe versions depended on libvirt. If an instance is attached to
interface, option rom is loaded into guest's rom. When doing live migration,
qemu will check rom size and try to resize resizable memory region. However,
option rom is not resizable. Once the destination node found option rom size
changed when loaded to memory, an exception will occur and stop the migration
process.

Steps to reproduce
==================
A simple way to reproduce:
* Prepare two nova-compute node, which can be the same version
* Create an instance on Node A, and attach an interface to it
* Check which ipxe rom is loaded into memory by its model type.
  For example, if an interface is defined with `<model type='virtio'/>`, then
  `/usr/lib/ipxe/qemu/efi-virtio.rom` is loaded to rom on ubuntu x86 system.
* Change the rom's virtual size on the destination Node B. 
Simply `echo "hello" > /usr/lib/ipxe/qemu/efi-virtio.rom`
The virtual size is the max length when rom is loaded to guest's memory, which
is exponential times of 2. We can use the following command to get rom's 
virtual size.
  `virsh qemu-monitor-command <domain> --hmp 'info ramblock'` 
* Do live-migration

  `nova live-migration --block-migrate cirros1 cmp02`

Expected result
===============
Normally, if the rom's virtual size is not changed, migration will succeed.

Actual result
=============
After the execution of the steps above, the live-migration will fail with 
error.

Environment
===========
Nova version:
$ dpkg -l | grep nova
ii  nova-common                            2:21.2.1-0ubuntu1     all
ii  nova-compute                           2:21.2.1-0ubuntu1     all
ii  nova-compute-kvm                       2:21.2.1-0ubuntu1     all
ii  nova-compute-libvirt                   2:21.2.1-0ubuntu1     all
ii  python3-nova                           2:21.2.1-0ubuntu1     all
ii  python3-novaclient                     2:17.0.0-0ubuntu1     all

Hypervisor type: libvirt
$ dpkg -l | grep libvirt
ii  libvirt-clients                        6.0.0-0ubuntu8.13     amd64
ii  libvirt-daemon                         6.0.0-0ubuntu8.13     amd64
ii  libvirt-daemon-driver-qemu             6.0.0-0ubuntu8.13     amd64
ii  libvirt-daemon-driver-storage-rbd      6.0.0-0ubuntu8.13     amd64
ii  libvirt-daemon-system                  6.0.0-0ubuntu8.13     amd64
ii  libvirt-daemon-system-systemd          6.0.0-0ubuntu8.13     amd64
ii  libvirt0:amd64                         6.0.0-0ubuntu8.13     amd64
ii  nova-compute-libvirt                   2:21.2.1-0ubuntu1     all
ii  python3-libvirt                        6.1.0-1               amd64

Networking type: Neutron with OpenVSwitch
$ dpkg -l | grep neutron
ii  neutron-common                         2:16.4.0-0ubuntu3     all
ii  neutron-openvswitch-agent              2:16.4.0-0ubuntu3     all
ii  python3-neutron                        2:16.4.0-0ubuntu3     all
ii  python3-neutron-lib                    2.3.0-0ubuntu1        all
ii  python3-neutronclient                  1:7.1.1-0ubuntu1      all

Logs & Configs
==============
```text
2021-09-22 10:10:31.451 35235 ERROR nova.virt.libvirt.driver [-] [instance: 6d91c241-75b8-4067-8874-c64970b87f6a] Migration operation has aborted
2021-09-22 10:10:31.644 35235 INFO nova.compute.manager [-] [instance: 6d91c241-75b8-4067-8874-c64970b87f6a] Swapping old allocation on dict_keys(['61b9a486-f53e-4b70-b54c-0db29f8ff978']) held by migration f5308871-0e91-48b0-8a68-a7d66239b3bd for instance
2021-09-22 10:10:31.671 35235 ERROR nova.virt.libvirt.driver [-] [instance: 6d91c241-75b8-4067-8874-c64970b87f6a] Live Migration failure: internal error: qemu unexpectedly closed the monitor: 2021-09-22T02:10:31.450377Z qemu-system-x86_64: Length mismatch: 0000:00:03.0/virtio-net-pci.rom: 0x1000 in != 0x80000: Invalid argument
2021-09-22T02:10:31.450414Z qemu-system-x86_64: error while loading state for instance 0x0 of device 'ram'
2021-09-22T02:10:31.452282Z qemu-system-x86_64: load of migration failed: Invalid argument: libvirt.libvirtError: internal error: qemu unexpectedly closed the monitor: 2021-09-22T02:10:31.450377Z qemu-system-x86_64: Length mismatch: 0000:00:03.0/virtio-net-pci.rom: 0x1000 in != 0x80000: Invalid argument
```

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1946298

Title:
  live-migration fails when option rom size is different in guest's
  memory

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===========
  This problem is found when doing live-migration accross nova versions,
  especially ipxe versions depended on libvirt. If an instance is attached to
  interface, option rom is loaded into guest's rom. When doing live migration,
  qemu will check rom size and try to resize resizable memory region. However,
  option rom is not resizable. Once the destination node found option rom size
  changed when loaded to memory, an exception will occur and stop the migration
  process.

  Steps to reproduce
  ==================
  A simple way to reproduce:
  * Prepare two nova-compute node, which can be the same version
  * Create an instance on Node A, and attach an interface to it
  * Check which ipxe rom is loaded into memory by its model type.
    For example, if an interface is defined with `<model type='virtio'/>`, then
    `/usr/lib/ipxe/qemu/efi-virtio.rom` is loaded to rom on ubuntu x86 system.
  * Change the rom's virtual size on the destination Node B. 
  Simply `echo "hello" > /usr/lib/ipxe/qemu/efi-virtio.rom`
  The virtual size is the max length when rom is loaded to guest's memory, which
  is exponential times of 2. We can use the following command to get rom's 
  virtual size.
    `virsh qemu-monitor-command <domain> --hmp 'info ramblock'` 
  * Do live-migration

    `nova live-migration --block-migrate cirros1 cmp02`

  Expected result
  ===============
  Normally, if the rom's virtual size is not changed, migration will succeed.

  Actual result
  =============
  After the execution of the steps above, the live-migration will fail with 
  error.

  Environment
  ===========
  Nova version:
  $ dpkg -l | grep nova
  ii  nova-common                            2:21.2.1-0ubuntu1     all
  ii  nova-compute                           2:21.2.1-0ubuntu1     all
  ii  nova-compute-kvm                       2:21.2.1-0ubuntu1     all
  ii  nova-compute-libvirt                   2:21.2.1-0ubuntu1     all
  ii  python3-nova                           2:21.2.1-0ubuntu1     all
  ii  python3-novaclient                     2:17.0.0-0ubuntu1     all

  Hypervisor type: libvirt
  $ dpkg -l | grep libvirt
  ii  libvirt-clients                        6.0.0-0ubuntu8.13     amd64
  ii  libvirt-daemon                         6.0.0-0ubuntu8.13     amd64
  ii  libvirt-daemon-driver-qemu             6.0.0-0ubuntu8.13     amd64
  ii  libvirt-daemon-driver-storage-rbd      6.0.0-0ubuntu8.13     amd64
  ii  libvirt-daemon-system                  6.0.0-0ubuntu8.13     amd64
  ii  libvirt-daemon-system-systemd          6.0.0-0ubuntu8.13     amd64
  ii  libvirt0:amd64                         6.0.0-0ubuntu8.13     amd64
  ii  nova-compute-libvirt                   2:21.2.1-0ubuntu1     all
  ii  python3-libvirt                        6.1.0-1               amd64

  Networking type: Neutron with OpenVSwitch
  $ dpkg -l | grep neutron
  ii  neutron-common                         2:16.4.0-0ubuntu3     all
  ii  neutron-openvswitch-agent              2:16.4.0-0ubuntu3     all
  ii  python3-neutron                        2:16.4.0-0ubuntu3     all
  ii  python3-neutron-lib                    2.3.0-0ubuntu1        all
  ii  python3-neutronclient                  1:7.1.1-0ubuntu1      all

  Logs & Configs
  ==============
  ```text
  2021-09-22 10:10:31.451 35235 ERROR nova.virt.libvirt.driver [-] [instance: 6d91c241-75b8-4067-8874-c64970b87f6a] Migration operation has aborted
  2021-09-22 10:10:31.644 35235 INFO nova.compute.manager [-] [instance: 6d91c241-75b8-4067-8874-c64970b87f6a] Swapping old allocation on dict_keys(['61b9a486-f53e-4b70-b54c-0db29f8ff978']) held by migration f5308871-0e91-48b0-8a68-a7d66239b3bd for instance
  2021-09-22 10:10:31.671 35235 ERROR nova.virt.libvirt.driver [-] [instance: 6d91c241-75b8-4067-8874-c64970b87f6a] Live Migration failure: internal error: qemu unexpectedly closed the monitor: 2021-09-22T02:10:31.450377Z qemu-system-x86_64: Length mismatch: 0000:00:03.0/virtio-net-pci.rom: 0x1000 in != 0x80000: Invalid argument
  2021-09-22T02:10:31.450414Z qemu-system-x86_64: error while loading state for instance 0x0 of device 'ram'
  2021-09-22T02:10:31.452282Z qemu-system-x86_64: load of migration failed: Invalid argument: libvirt.libvirtError: internal error: qemu unexpectedly closed the monitor: 2021-09-22T02:10:31.450377Z qemu-system-x86_64: Length mismatch: 0000:00:03.0/virtio-net-pci.rom: 0x1000 in != 0x80000: Invalid argument
  ```

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1946298/+subscriptions



Follow ups