yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #90765
[Bug 1999814] [NEW] Allow for specifying common baseline CPU model with disabled feature
Public bug reported:
Hello,
This is very similar to pad.lv/1852437 (and the related blueprint at
https://blueprints.launchpad.net/nova/+spec/allow-disabling-cpu-flags),
but there is a very different and important nuance.
A customer I'm working with has two classes of blades that they're
trying to use. Their existing ones are Cascade Lake-based; they are
presently using the Cascadelake-Server-noTSX CPU model via
libvirt.cpu_model in nova.conf. Their new blades are Ice Lake-based,
which is a newer processor, which typically would also be able to run
based on the Cascade Lake feature set - except that these Ice Lake
processors lack the MPX feature defined in the Cascadelake-Server-noTSX
model.
The result of this is evident when I try to start nova on the new blades
with the Ice Lake CPUs. Even if I specify the following in my
nova.conf:
[libvirt]
cpu_mode = custom
cpu_model = Cascadelake-Server-noTSX
cpu_model_extra_flags = -mpx
That is not enough to allow Nova to start; it fails in the libvirt
driver in the _check_cpu_compatibility function:
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service Traceback (most recent call last):
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 771, in _check_cpu_compatibility
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service self._compare_cpu(cpu, self._get_cpu_info(), None)
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 8817, in _compare_cpu
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service raise exception.InvalidCPUInfo(reason=m % {'ret': ret, 'u': u})
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service nova.exception.InvalidCPUInfo: Unacceptable CPU info: CPU doesn't have compatibility.
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 0
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service Refer to http://libvirt.org/html/libvirt-libvirt-host.html#virCPUCompareResult
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service During handling of the above exception, another exception occurred:
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service Traceback (most recent call last):
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/oslo_service/service.py", line 810, in run_service
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service service.start()
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/service.py", line 173, in start
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service self.manager.init_host()
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 1404, in init_host
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service self.driver.init_host(host=self.host)
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 743, in init_host
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service self._check_cpu_compatibility()
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 777, in _check_cpu_compatibility
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service raise exception.InvalidCPUInfo(msg)
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service nova.exception.InvalidCPUInfo: Configured CPU model: Cascadelake-Server-noTSX is not compatible with host CPU. Please correct your config and try again. Unacceptable CPU info: CPU doesn't have compatibility.
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 0
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service Refer to http://libvirt.org/html/libvirt-libvirt-host.html#virCPUCompareResult
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service
If I make a custom libvirt CPU map file which removes the "<feature
name='mpx'/>" feature and specify that as the cpu_model instead, I am
able to make Nova start - so it does indeed seem to specifically be that
single feature which is blocking me. However, editing the libvirt CPU
mapping files is probably not the right way to fix this - hence why I'm
filing this bug, for discussion of how to support cases like this.
Currently the only "proper" way I'm aware of to work around this right
now is to fall back to a Broadwell-based configuration which lacks the
"mpx" feature to use as a common baseline, but that's a much older
configuration than Cascade Lake and would mean missing out on all the
other features which are common in both Cascade Lake and Ice Lake. I
would rather if there were a way to use the Cascade Lake settings but
simply remove that "mpx" feature from use.
----
Steps to reproduce
==================
On an Ice Lake system lacking the MPX feature (e.g. /proc/cpuinfo
reporting model of "Intel(R) Xeon(R) Gold 5318Y"), specify the following
settings in nova.conf in libvirt settings:
[libvirt]
cpu_mode = custom
cpu_model = Cascadelake-Server-noTSX
cpu_model_extra_flags = -mpx
Then try to start nova.
Expected result
===============
Nova should start since Cascadelake-Server-noTSX is a subset of Icelake-
Server-noTSX, thus allowing the use of Cascadelake-Server-noTSX as a
common baseline model for both Cascade Lake and Ice Lake servers.
Actual result
=============
Nova refuses to start, claiming the specified CPU model is incompatible.
The "cpu_model_extra_flags = -mpx" config option does not help.
Environment
===========
Nova/OpenStack version: OpenStack Ussuri running on Ubuntu Focal.
Specifically, nova packages are at version 2:21.2.4-0ubuntu2.
Hypervisor: libvirt + KVM
Other relevant notes
====================
There are some other open related bugs. The removal of the MPX feature
in some Ice Lake processors has manifested in other ways as well. These
bugs are primarily in regards to the missing MPX feature breaking how
Ice Lake processors are detected, so the nuance is somewhat different -
however, they may be worth reviewing as well.
* https://gitlab.com/libvirt/libvirt/-/issues/304: bug regarding the
Icelake CPU maps in libvirt not working to detect certain Ice Lakes,
instead detecting them as Broadwell-noTSX-IBRS according to "virsh
capabilities" due to lacking the MPX feature. (I've personally tested
that removing the mpx feature from the associated CPU mapping files
allows for detecting as Ice Lake, but that's not the correct way to fix
this.)
There is also an interesting comment on this bug at
https://gitlab.com/libvirt/libvirt/-/issues/304#note_1065798706. It
basically implies that rather than looking at "virsh capabilities",
"virsh domcapabilities" should be used instead as it seems to more
correctly identify the CPU model even if there are disabled flags like
MPX.
* https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1978064:
Launchpad-side bug regarding the above issue as encountered in Ubuntu.
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1999814
Title:
Allow for specifying common baseline CPU model with disabled feature
Status in OpenStack Compute (nova):
New
Bug description:
Hello,
This is very similar to pad.lv/1852437 (and the related blueprint at
https://blueprints.launchpad.net/nova/+spec/allow-disabling-cpu-
flags), but there is a very different and important nuance.
A customer I'm working with has two classes of blades that they're
trying to use. Their existing ones are Cascade Lake-based; they are
presently using the Cascadelake-Server-noTSX CPU model via
libvirt.cpu_model in nova.conf. Their new blades are Ice Lake-based,
which is a newer processor, which typically would also be able to run
based on the Cascade Lake feature set - except that these Ice Lake
processors lack the MPX feature defined in the Cascadelake-Server-
noTSX model.
The result of this is evident when I try to start nova on the new
blades with the Ice Lake CPUs. Even if I specify the following in my
nova.conf:
[libvirt]
cpu_mode = custom
cpu_model = Cascadelake-Server-noTSX
cpu_model_extra_flags = -mpx
That is not enough to allow Nova to start; it fails in the libvirt
driver in the _check_cpu_compatibility function:
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service Traceback (most recent call last):
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 771, in _check_cpu_compatibility
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service self._compare_cpu(cpu, self._get_cpu_info(), None)
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 8817, in _compare_cpu
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service raise exception.InvalidCPUInfo(reason=m % {'ret': ret, 'u': u})
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service nova.exception.InvalidCPUInfo: Unacceptable CPU info: CPU doesn't have compatibility.
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 0
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service Refer to http://libvirt.org/html/libvirt-libvirt-host.html#virCPUCompareResult
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service During handling of the above exception, another exception occurred:
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service Traceback (most recent call last):
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/oslo_service/service.py", line 810, in run_service
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service service.start()
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/service.py", line 173, in start
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service self.manager.init_host()
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 1404, in init_host
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service self.driver.init_host(host=self.host)
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 743, in init_host
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service self._check_cpu_compatibility()
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 777, in _check_cpu_compatibility
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service raise exception.InvalidCPUInfo(msg)
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service nova.exception.InvalidCPUInfo: Configured CPU model: Cascadelake-Server-noTSX is not compatible with host CPU. Please correct your config and try again. Unacceptable CPU info: CPU doesn't have compatibility.
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service 0
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service Refer to http://libvirt.org/html/libvirt-libvirt-host.html#virCPUCompareResult
2022-12-15 17:20:59.562 1836708 ERROR oslo_service.service
If I make a custom libvirt CPU map file which removes the "<feature
name='mpx'/>" feature and specify that as the cpu_model instead, I am
able to make Nova start - so it does indeed seem to specifically be
that single feature which is blocking me. However, editing the
libvirt CPU mapping files is probably not the right way to fix this -
hence why I'm filing this bug, for discussion of how to support cases
like this.
Currently the only "proper" way I'm aware of to work around this right
now is to fall back to a Broadwell-based configuration which lacks the
"mpx" feature to use as a common baseline, but that's a much older
configuration than Cascade Lake and would mean missing out on all the
other features which are common in both Cascade Lake and Ice Lake. I
would rather if there were a way to use the Cascade Lake settings but
simply remove that "mpx" feature from use.
----
Steps to reproduce
==================
On an Ice Lake system lacking the MPX feature (e.g. /proc/cpuinfo
reporting model of "Intel(R) Xeon(R) Gold 5318Y"), specify the
following settings in nova.conf in libvirt settings:
[libvirt]
cpu_mode = custom
cpu_model = Cascadelake-Server-noTSX
cpu_model_extra_flags = -mpx
Then try to start nova.
Expected result
===============
Nova should start since Cascadelake-Server-noTSX is a subset of
Icelake-Server-noTSX, thus allowing the use of Cascadelake-Server-
noTSX as a common baseline model for both Cascade Lake and Ice Lake
servers.
Actual result
=============
Nova refuses to start, claiming the specified CPU model is
incompatible. The "cpu_model_extra_flags = -mpx" config option does
not help.
Environment
===========
Nova/OpenStack version: OpenStack Ussuri running on Ubuntu Focal.
Specifically, nova packages are at version 2:21.2.4-0ubuntu2.
Hypervisor: libvirt + KVM
Other relevant notes
====================
There are some other open related bugs. The removal of the MPX
feature in some Ice Lake processors has manifested in other ways as
well. These bugs are primarily in regards to the missing MPX feature
breaking how Ice Lake processors are detected, so the nuance is
somewhat different - however, they may be worth reviewing as well.
* https://gitlab.com/libvirt/libvirt/-/issues/304: bug regarding the
Icelake CPU maps in libvirt not working to detect certain Ice Lakes,
instead detecting them as Broadwell-noTSX-IBRS according to "virsh
capabilities" due to lacking the MPX feature. (I've personally tested
that removing the mpx feature from the associated CPU mapping files
allows for detecting as Ice Lake, but that's not the correct way to
fix this.)
There is also an interesting comment on this bug at
https://gitlab.com/libvirt/libvirt/-/issues/304#note_1065798706. It
basically implies that rather than looking at "virsh capabilities",
"virsh domcapabilities" should be used instead as it seems to more
correctly identify the CPU model even if there are disabled flags like
MPX.
* https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1978064:
Launchpad-side bug regarding the above issue as encountered in Ubuntu.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1999814/+subscriptions
Follow ups