← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2039803] [NEW] compareHypervisorCPU() incompatibility during live migration

 

Public bug reported:

Description
===========
Live migration fails with 

Refer to http://libvirt.org/html/libvirt-libvirt-host.html#virCPUCompareResult
2023-10-17 08:28:15.301 2 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2023-10-17 08:28:15.301 2 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 9636, in check_can_live_migrate_destination
2023-10-17 08:28:15.301 2 ERROR oslo_messaging.rpc.server     self._compare_cpu(None, source_cpu_info, instance)
2023-10-17 08:28:15.301 2 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 10013, in _compare_cpu
2023-10-17 08:28:15.301 2 ERROR oslo_messaging.rpc.server     raise exception.InvalidCPUInfo(reason=m % {'ret': ret, 'u': u})
2023-10-17 08:28:15.301 2 ERROR oslo_messaging.rpc.server nova.exception.InvalidCPUInfo: Unacceptable CPU info: CPU doesn't have compatibility.

[...]

2023-10-17 08:28:15.301 2 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 9640, in check_can_live_migrate_destination
2023-10-17 08:28:15.301 2 ERROR oslo_messaging.rpc.server     raise exception.MigrationPreCheckError(reason=e)
2023-10-17 08:28:15.301 2 ERROR oslo_messaging.rpc.server nova.exception.MigrationPreCheckError: Migration pre-check error: Unacceptable CPU info: CPU doesn't have compatibility.


If skip_cpu_compare_on_dest is set to True the the live migration succeeds. So the issue seems to be only in the check nova does and the hypervisors are actually compatible.


Steps to reproduce
==================
* boot a simple cirros VM
* openstack server migrate --live --block-migration <vm>


Environment
===========
OpenStack: 2023.1.
libvirt version: 9.5.0
QEMU: 8.1.0
Hypervisors: two centos stream 9 VMs with nested KVM enabled
nova compute is configured with cpu_mode=host-model

Triage
======


During the pre_live_migration check running on the destination node nova sees that in the DB the guest has no vcpu_model set and therefore falls back to do host CPU model based comparison[1]. The host cpu_info used there is collected with the getCapabilities() from libvirt [2]. And in this system that returns SandyBridge. In the other hand the guest VM is running as Broadwell (note nova is configured with cpu_mode=host-model) and also virsh domcapabilities returns Broadwell as the host model.

There are two reasons for the failure:
1) nova uses getCapabilities() to determine the host CPU model but use the model from the domCapabilities for the guest VM using host-model. According to the libvirt maintainers nova should never use getCapabilities for anything any more.

2) nova falls back to do a host CPU based comparison if the guest
vcpu_model is not filled in the nova DB. But for live migration the
guest CPU model should be available as the guest exists and running on
the source node.


[1] https://github.com/openstack/nova/blob/a869ab17c095cbff2c942ab94247b0c30723b230/nova/virt/libvirt/driver.py#L9960-L9975
[2] https://github.com/openstack/nova/blob/a869ab17c095cbff2c942ab94247b0c30723b230/nova/virt/libvirt/host.py#L793-L796

** Affects: nova
     Importance: Medium
         Status: Triaged

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2039803

Title:
  compareHypervisorCPU() incompatibility during live migration

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  Description
  ===========
  Live migration fails with 

  Refer to http://libvirt.org/html/libvirt-libvirt-host.html#virCPUCompareResult
  2023-10-17 08:28:15.301 2 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
  2023-10-17 08:28:15.301 2 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 9636, in check_can_live_migrate_destination
  2023-10-17 08:28:15.301 2 ERROR oslo_messaging.rpc.server     self._compare_cpu(None, source_cpu_info, instance)
  2023-10-17 08:28:15.301 2 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 10013, in _compare_cpu
  2023-10-17 08:28:15.301 2 ERROR oslo_messaging.rpc.server     raise exception.InvalidCPUInfo(reason=m % {'ret': ret, 'u': u})
  2023-10-17 08:28:15.301 2 ERROR oslo_messaging.rpc.server nova.exception.InvalidCPUInfo: Unacceptable CPU info: CPU doesn't have compatibility.

  [...]

  2023-10-17 08:28:15.301 2 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 9640, in check_can_live_migrate_destination
  2023-10-17 08:28:15.301 2 ERROR oslo_messaging.rpc.server     raise exception.MigrationPreCheckError(reason=e)
  2023-10-17 08:28:15.301 2 ERROR oslo_messaging.rpc.server nova.exception.MigrationPreCheckError: Migration pre-check error: Unacceptable CPU info: CPU doesn't have compatibility.

  
  If skip_cpu_compare_on_dest is set to True the the live migration succeeds. So the issue seems to be only in the check nova does and the hypervisors are actually compatible.

  
  Steps to reproduce
  ==================
  * boot a simple cirros VM
  * openstack server migrate --live --block-migration <vm>

  
  Environment
  ===========
  OpenStack: 2023.1.
  libvirt version: 9.5.0
  QEMU: 8.1.0
  Hypervisors: two centos stream 9 VMs with nested KVM enabled
  nova compute is configured with cpu_mode=host-model

  Triage
  ======

  
  During the pre_live_migration check running on the destination node nova sees that in the DB the guest has no vcpu_model set and therefore falls back to do host CPU model based comparison[1]. The host cpu_info used there is collected with the getCapabilities() from libvirt [2]. And in this system that returns SandyBridge. In the other hand the guest VM is running as Broadwell (note nova is configured with cpu_mode=host-model) and also virsh domcapabilities returns Broadwell as the host model.

  There are two reasons for the failure:
  1) nova uses getCapabilities() to determine the host CPU model but use the model from the domCapabilities for the guest VM using host-model. According to the libvirt maintainers nova should never use getCapabilities for anything any more.

  2) nova falls back to do a host CPU based comparison if the guest
  vcpu_model is not filled in the nova DB. But for live migration the
  guest CPU model should be available as the guest exists and running on
  the source node.

  
  [1] https://github.com/openstack/nova/blob/a869ab17c095cbff2c942ab94247b0c30723b230/nova/virt/libvirt/driver.py#L9960-L9975
  [2] https://github.com/openstack/nova/blob/a869ab17c095cbff2c942ab94247b0c30723b230/nova/virt/libvirt/host.py#L793-L796

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2039803/+subscriptions