yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #92969
[Bug 2039803] [NEW] compareHypervisorCPU() incompatibility during live migration
Public bug reported:
Description
===========
Live migration fails with
Refer to http://libvirt.org/html/libvirt-libvirt-host.html#virCPUCompareResult
2023-10-17 08:28:15.301 2 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2023-10-17 08:28:15.301 2 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 9636, in check_can_live_migrate_destination
2023-10-17 08:28:15.301 2 ERROR oslo_messaging.rpc.server self._compare_cpu(None, source_cpu_info, instance)
2023-10-17 08:28:15.301 2 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 10013, in _compare_cpu
2023-10-17 08:28:15.301 2 ERROR oslo_messaging.rpc.server raise exception.InvalidCPUInfo(reason=m % {'ret': ret, 'u': u})
2023-10-17 08:28:15.301 2 ERROR oslo_messaging.rpc.server nova.exception.InvalidCPUInfo: Unacceptable CPU info: CPU doesn't have compatibility.
[...]
2023-10-17 08:28:15.301 2 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 9640, in check_can_live_migrate_destination
2023-10-17 08:28:15.301 2 ERROR oslo_messaging.rpc.server raise exception.MigrationPreCheckError(reason=e)
2023-10-17 08:28:15.301 2 ERROR oslo_messaging.rpc.server nova.exception.MigrationPreCheckError: Migration pre-check error: Unacceptable CPU info: CPU doesn't have compatibility.
If skip_cpu_compare_on_dest is set to True the the live migration succeeds. So the issue seems to be only in the check nova does and the hypervisors are actually compatible.
Steps to reproduce
==================
* boot a simple cirros VM
* openstack server migrate --live --block-migration <vm>
Environment
===========
OpenStack: 2023.1.
libvirt version: 9.5.0
QEMU: 8.1.0
Hypervisors: two centos stream 9 VMs with nested KVM enabled
nova compute is configured with cpu_mode=host-model
Triage
======
During the pre_live_migration check running on the destination node nova sees that in the DB the guest has no vcpu_model set and therefore falls back to do host CPU model based comparison[1]. The host cpu_info used there is collected with the getCapabilities() from libvirt [2]. And in this system that returns SandyBridge. In the other hand the guest VM is running as Broadwell (note nova is configured with cpu_mode=host-model) and also virsh domcapabilities returns Broadwell as the host model.
There are two reasons for the failure:
1) nova uses getCapabilities() to determine the host CPU model but use the model from the domCapabilities for the guest VM using host-model. According to the libvirt maintainers nova should never use getCapabilities for anything any more.
2) nova falls back to do a host CPU based comparison if the guest
vcpu_model is not filled in the nova DB. But for live migration the
guest CPU model should be available as the guest exists and running on
the source node.
[1] https://github.com/openstack/nova/blob/a869ab17c095cbff2c942ab94247b0c30723b230/nova/virt/libvirt/driver.py#L9960-L9975
[2] https://github.com/openstack/nova/blob/a869ab17c095cbff2c942ab94247b0c30723b230/nova/virt/libvirt/host.py#L793-L796
** Affects: nova
Importance: Medium
Status: Triaged
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2039803
Title:
compareHypervisorCPU() incompatibility during live migration
Status in OpenStack Compute (nova):
Triaged
Bug description:
Description
===========
Live migration fails with
Refer to http://libvirt.org/html/libvirt-libvirt-host.html#virCPUCompareResult
2023-10-17 08:28:15.301 2 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2023-10-17 08:28:15.301 2 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 9636, in check_can_live_migrate_destination
2023-10-17 08:28:15.301 2 ERROR oslo_messaging.rpc.server self._compare_cpu(None, source_cpu_info, instance)
2023-10-17 08:28:15.301 2 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 10013, in _compare_cpu
2023-10-17 08:28:15.301 2 ERROR oslo_messaging.rpc.server raise exception.InvalidCPUInfo(reason=m % {'ret': ret, 'u': u})
2023-10-17 08:28:15.301 2 ERROR oslo_messaging.rpc.server nova.exception.InvalidCPUInfo: Unacceptable CPU info: CPU doesn't have compatibility.
[...]
2023-10-17 08:28:15.301 2 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 9640, in check_can_live_migrate_destination
2023-10-17 08:28:15.301 2 ERROR oslo_messaging.rpc.server raise exception.MigrationPreCheckError(reason=e)
2023-10-17 08:28:15.301 2 ERROR oslo_messaging.rpc.server nova.exception.MigrationPreCheckError: Migration pre-check error: Unacceptable CPU info: CPU doesn't have compatibility.
If skip_cpu_compare_on_dest is set to True the the live migration succeeds. So the issue seems to be only in the check nova does and the hypervisors are actually compatible.
Steps to reproduce
==================
* boot a simple cirros VM
* openstack server migrate --live --block-migration <vm>
Environment
===========
OpenStack: 2023.1.
libvirt version: 9.5.0
QEMU: 8.1.0
Hypervisors: two centos stream 9 VMs with nested KVM enabled
nova compute is configured with cpu_mode=host-model
Triage
======
During the pre_live_migration check running on the destination node nova sees that in the DB the guest has no vcpu_model set and therefore falls back to do host CPU model based comparison[1]. The host cpu_info used there is collected with the getCapabilities() from libvirt [2]. And in this system that returns SandyBridge. In the other hand the guest VM is running as Broadwell (note nova is configured with cpu_mode=host-model) and also virsh domcapabilities returns Broadwell as the host model.
There are two reasons for the failure:
1) nova uses getCapabilities() to determine the host CPU model but use the model from the domCapabilities for the guest VM using host-model. According to the libvirt maintainers nova should never use getCapabilities for anything any more.
2) nova falls back to do a host CPU based comparison if the guest
vcpu_model is not filled in the nova DB. But for live migration the
guest CPU model should be available as the guest exists and running on
the source node.
[1] https://github.com/openstack/nova/blob/a869ab17c095cbff2c942ab94247b0c30723b230/nova/virt/libvirt/driver.py#L9960-L9975
[2] https://github.com/openstack/nova/blob/a869ab17c095cbff2c942ab94247b0c30723b230/nova/virt/libvirt/host.py#L793-L796
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2039803/+subscriptions