yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #96398
[Bug 2122022] Re: live migrate failed
Thank you for submitting this bug report.
Regarding the reported behavior, this is known and aligns with certain
design considerations for live migration CPU compatibility. As you've
observed, Nova leverages libvirt's CPU comparison functions for this
process, which limits direct control over the specific comparison logic.
There is a way to relax or skip this comparison by setting the
`skip_cpu_compare_on_dest` configuration parameter.
Regarding your comment, "I think it is unreasonable to compare the
features of the host(host A) CPU with the hypervisor of the target
node(host B) using the compare_hypervisor_cpu function," this behavior
is intentional. It ensures that a live-migrated virtual machine
continues to have access to the same CPU capabilities it was started
with. Many operators rely on this strict comparison to guarantee
consistent application behavior across different compute hosts,
especially when specific CPU functionalities are critical for their
workloads.
However, this is a topic that can still be discussed at the next Project
Teams Gathering (PTG) (https://openinfra.org/ptg/), and you are welcome
to bring it up if you wish.
Please note that the OpenStack version you are running, Antelope
(2023.1), is no longer actively supported. You can find the list of
currently supported releases here: https://releases.openstack.org/
For these reasons, and given the use of an unsupported OpenStack
version, we are marking this bug as **'Invalid'**. If you still believe
this is a Nova bug and you can reproduce it on a supported OpenStack
version, please feel free to update this report with the necessary
details (referencing our bug reporting template:
https://wiki.openstack.org/wiki/Nova/BugsTeam/BugReportTemplate) and set
its status back to 'New'.
** Changed in: nova
Status: New => Invalid
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2122022
Title:
live migrate failed
Status in OpenStack Compute (nova):
Invalid
Bug description:
Description:
In Openstack antelope, cpu_mode and cpu_models are not explicitly
declared in nova.conf. A virtual machine is created and runs on Host
A. The CPU features of Host A include xsaves:
root@controller-2:~# lscpu | grep Flags
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke avx512_vnni md_clear flush_l1d arch_capabilities
However, the virtual machine’s CPU type is Cascadelake-Server, which
does not include the CPU feature xsaves:
root@controller-2:~# virsh dumpxml instance-00000515 | grep "cpu mode" -A40
<cpu mode='custom' match='exact' check='full'>
<model fallback='forbid'>Cascadelake-Server</model>
<vendor>Intel</vendor>
<topology sockets='8' dies='1' cores='1' threads='1'/>
<feature policy='require' name='ss'/>
<feature policy='require' name='vmx'/>
<feature policy='require' name='pdcm'/>
<feature policy='require' name='hypervisor'/>
<feature policy='require' name='tsc_adjust'/>
<feature policy='require' name='umip'/>
<feature policy='require' name='pku'/>
<feature policy='require' name='md-clear'/>
<feature policy='require' name='stibp'/>
<feature policy='require' name='arch-capabilities'/>
<feature policy='require' name='xsaves'/>
<feature policy='require' name='ibpb'/>
<feature policy='require' name='ibrs'/>
<feature policy='require' name='amd-stibp'/>
<feature policy='require' name='amd-ssbd'/>
<feature policy='require' name='rdctl-no'/>
<feature policy='require' name='ibrs-all'/>
<feature policy='require' name='skip-l1dfl-vmentry'/>
<feature policy='require' name='mds-no'/>
<feature policy='require' name='pschange-mc-no'/>
<feature policy='require' name='tsx-ctrl'/>
<feature policy='disable' name='hle'/>
<feature policy='disable' name='rtm'/>
<feature policy='disable' name='mpx'/>
</cpu>
When the virtual machine is live-migrated to target Host B(B has same
CPU with A), an error occurs during CPU compatibility checks. The log
reports the following error:
nova.exception.InvalidCPUInfo: Configured CPU model: Cascadelake-
Server is not compatible with host CPU. Please correct your config and
try again. Unacceptable CPU info: CPU doesn't have compatibility.
Debugging reveals that the issue is due to the missing xsaves feature:
[root@controller-1 home]# virsh hypervisor-cpu-compare test.xml --error
error: Failed to compare hypervisor CPU with test.xml
error: the CPU is incompatible with host CPU: Host CPU does not provide required features: xsaves
Preliminary code analysis shows that when a virtual machine is created
using the host-model method, the instance.vcpu_model.model property of
the instance object is empty. This causes the code to use the physical
CPU features of the host where the virtual machine resides (Host A)
and compare them with the hypervisor of the target host (Host B) using
the compare_hypervisor_cpu function, resulting in the error.
if not instance.vcpu_model or not instance.vcpu_model.model:
source_cpu_info = src_compute_info['cpu_info']
self._compare_cpu(None, source_cpu_info, instance)
I think it is unreasonable to compare the features of the host(host A)
CPU with the hypervisor of the target node(host B) using the
compare_hypervisor_cpu function.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2122022/+subscriptions
References