yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #94754
[Bug 2062425] Re: Nova/Placement creating x86 trait for ARM Compute node
Reviewed: https://review.opendev.org/c/openstack/nova/+/926521
Committed: https://opendev.org/openstack/nova/commit/ab18f3763c096d1f4c0da6ad825d670dd5a06b94
Submitter: "Zuul (22348)"
Branch: master
commit ab18f3763c096d1f4c0da6ad825d670dd5a06b94
Author: Amit Uniyal <auniyal@xxxxxxxxxx>
Date: Mon Aug 19 07:42:43 2024 +0000
Libvirt: updates resource provider trait list
This change updates resource provider trait list for hw architecture and
hw emulation architecture
Closes-Bug: #2062425
Change-Id: Ia571c5e5e881162d331b638ae2d4a332807d17f5
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2062425
Title:
Nova/Placement creating x86 trait for ARM Compute node
Status in OpenStack Compute (nova):
Fix Released
Bug description:
Description
===========
I have a 2023.2 based deployment with both x84 and aarch64 based compute nodes. For the arm node, placement is showing it having an x86 HW trait, causing scheduling of arm architecture images onto it to fail. It also causes it to try and schedule x86 images onto here, which will fail.
Steps to reproduce
==================
1. I deployed a new 2023.2 deployment with Kolla-ansible.
2. Add hw_architecture=aarch64 to a valid glance image
3. Ensure that image_metadata_prefilter = True in nova.conf on all nova services
4. Try and deploy an instance with that image, it will fail with no valid host found
5. Observe the following in the placement-api logs:
placement-api.log:41054:2024-04-18 20:39:04.271 21 DEBUG
placement.requestlog [req-0114c318-5dfd-4588-807b-e591a82ce098 req-
bd588ea0-5700-4b8e-a43f-0eb15a7275e8 - - - - - -] Starting request:
10.27.10.33 "GET
/allocation_candidates?limit=1000&member_of=in%3Aceceb7fb-e0ed-4304-a69f-b327da7ca63f&resources=DISK_GB%3A60%2CMEMORY_MB%3A8192%2CVCPU%3A4&root_required=HW_ARCH_AARCH64%2C%21COMPUTE_STATUS_DISABLED"
__call__ /var/lib/kolla/venv/lib/python3.10/site-
packages/placement/requestlog.py:55
placement-api.log:41055:2024-04-18 20:39:04.317 21 DEBUG
placement.objects.research_context
[req-0114c318-5dfd-4588-807b-e591a82ce098 req-
bd588ea0-5700-4b8e-a43f-0eb15a7275e8 8ce24731fb34492c9354f05050216395
c48da85ca48f4296b59bacb7b3c2fdfd - - default default] found no
providers satisfying required traits: {'HW_ARCH_AARCH64'} and
forbidden traits: {'COMPUTE_STATUS_DISABLED'} _process_anchor_traits
/var/lib/kolla/venv/lib/python3.10/site-
packages/placement/objects/research_context.py:243
Resource providers:
openstack resource provider list
+--------------------------------------+---------------------------+------------+--------------------------------------+----------------------+
| uuid | name | generation | root_provider_uuid | parent_provider_uuid |
+--------------------------------------+---------------------------+------------+--------------------------------------+----------------------+
| a6aa43fb-c819-4dae-b172-b5ed76901591 | infra-prod-compute-04 | 7 | a6aa43fb-c819-4dae-b172-b5ed76901591 | None |
| 2a019b35-25ac-4085-a13d-07802bda6828 | infra-prod-compute-03 | 10 | 2a019b35-25ac-4085-a13d-07802bda6828 | None |
| a008c58b-d16c-4b80-8f58-ca96d1fce2a3 | infra-prod-compute-05 | 7 | a008c58b-d16c-4b80-8f58-ca96d1fce2a3 | None |
| e97340aa-5848-4939-a409-701e5ad52396 | infra-prod-compute-02 | 31 | e97340aa-5848-4939-a409-701e5ad52396 | None |
| 9345e4d0-fc49-4e51-9f38-faeabec1b053 | infra-prod-compute-01 | 18 | 9345e4d0-fc49-4e51-9f38-faeabec1b053 | None |
| 41611dae-3006-4449-9c8b-3369d9b0feb8 | infra-prod-compile-01 | 5 | 41611dae-3006-4449-9c8b-3369d9b0feb8 | None |
| 7fecff4c-9e2d-4d89-a345-91ab4d8c1857 | infra-prod-compile-02 | 5 | 7fecff4c-9e2d-4d89-a345-91ab4d8c1857 | None |
| fbd4030a-1cc9-455a-bca2-2b606fcb3c4d | infra-prod-compile-03 | 5 | fbd4030a-1cc9-455a-bca2-2b606fcb3c4d | None |
| 4d3b29fd-0048-4768-93fa-b7a98f81c125 | infra-prod-compute-06 | 9 | 4d3b29fd-0048-4768-93fa-b7a98f81c125 | None |
| f888bda6-8fb7-4f84-8b87-c9af3b36a6ae | infra-prod-compute-07 | 7 | f888bda6-8fb7-4f84-8b87-c9af3b36a6ae | None |
| 4f53c8d0-bf1d-44d3-89d5-b8f5436ee66a | infra-prod-compile-04 | 5 | 4f53c8d0-bf1d-44d3-89d5-b8f5436ee66a | None |
| 7b6a42c8-b9b4-44a6-9111-2f732c7074e1 | infra-prod-compile-05 | 5 | 7b6a42c8-b9b4-44a6-9111-2f732c7074e1 | None |
| 8312a824-8d88-4646-9eb5-c4937329dab9 | infra-prod-compute-08 | 4 | 8312a824-8d88-4646-9eb5-c4937329dab9 | None |
| 9e60caa5-28ed-4719-aaf5-690b111f17fd | infra-prod-compute-09 | 4 | 9e60caa5-28ed-4719-aaf5-690b111f17fd | None |
| cbfef7fd-b910-4d77-b448-70cdb9638967 | infra-prod-compute-10 | 4 | cbfef7fd-b910-4d77-b448-70cdb9638967 | None |
| d7efda90-b91c-419f-b0be-0f339f37653a | infra-prod-compute-11 | 4 | d7efda90-b91c-419f-b0be-0f339f37653a | None |
| 067f20f4-f513-465e-9e32-e505a97ab165 | infra-prod-compute-12 | 4 | 067f20f4-f513-465e-9e32-e505a97ab165 | None |
| 57a098bf-31d4-4e4f-9a28-72a925d2384c | infra-prod-arm-compute-01 | 12 | 57a098bf-31d4-4e4f-9a28-72a925d2384c | None |
| 632c23d6-63df-4143-9d4c-deb2bdc94c80 | infra-prod-compute-13 | 4 | 632c23d6-63df-4143-9d4c-deb2bdc94c80 | None |
| 0fe3d535-8aec-4307-943e-2c46b01bc019 | infra-prod-compute-14 | 4 | 0fe3d535-8aec-4307-943e-2c46b01bc019 | None |
| 8f60a0e9-2510-48ce-b305-6937314bac4a | infra-prod-compute-15 | 4 | 8f60a0e9-2510-48ce-b305-6937314bac4a | None |
+--------------------------------------+---------------------------+------------+--------------------------------------+----------------------+
Traits showing for the arm node (notice no HW_ARCH_AARCH64):
openstack resource provider trait list 57a098bf-31d4-4e4f-9a28-72a925d2384c
+---------------------------------------+
| name |
+---------------------------------------+
| COMPUTE_IMAGE_TYPE_QCOW2 |
| COMPUTE_ADDRESS_SPACE_EMULATED |
| COMPUTE_NET_VIF_MODEL_VMXNET3 |
| COMPUTE_GRAPHICS_MODEL_NONE |
| COMPUTE_IMAGE_TYPE_ISO |
| COMPUTE_DEVICE_TAGGING |
| COMPUTE_NET_VIF_MODEL_NE2K_PCI |
| COMPUTE_GRAPHICS_MODEL_VIRTIO |
| COMPUTE_RESCUE_BFV |
| COMPUTE_STORAGE_BUS_VIRTIO |
| COMPUTE_STORAGE_BUS_SCSI |
| COMPUTE_GRAPHICS_MODEL_VGA |
| COMPUTE_IMAGE_TYPE_AMI |
| COMPUTE_NET_VIF_MODEL_E1000 |
| COMPUTE_STORAGE_BUS_SATA |
| COMPUTE_NET_VIF_MODEL_PCNET |
| COMPUTE_NET_ATTACH_INTERFACE |
| HW_CPU_X86_AESNI |
| COMPUTE_STORAGE_BUS_USB |
| COMPUTE_ADDRESS_SPACE_PASSTHROUGH |
| COMPUTE_NET_VIF_MODEL_RTL8139 |
| COMPUTE_NET_ATTACH_INTERFACE_WITH_TAG |
| COMPUTE_VOLUME_ATTACH_WITH_TAG |
| COMPUTE_TRUSTED_CERTS |
| COMPUTE_IMAGE_TYPE_AKI |
| COMPUTE_VIOMMU_MODEL_SMMUV3 |
| COMPUTE_STORAGE_BUS_FDC |
| COMPUTE_VIOMMU_MODEL_AUTO |
| COMPUTE_VOLUME_EXTEND |
| COMPUTE_SOCKET_PCI_NUMA_AFFINITY |
| COMPUTE_NET_VIF_MODEL_E1000E |
| COMPUTE_NODE |
| COMPUTE_ACCELERATORS |
| COMPUTE_IMAGE_TYPE_RAW |
| COMPUTE_VOLUME_MULTI_ATTACH |
| COMPUTE_IMAGE_TYPE_ARI |
| COMPUTE_GRAPHICS_MODEL_BOCHS |
| COMPUTE_NET_VIF_MODEL_SPAPR_VLAN |
| COMPUTE_GRAPHICS_MODEL_CIRRUS |
| COMPUTE_GRAPHICS_MODEL_VMVGA |
| COMPUTE_NET_VIF_MODEL_VIRTIO |
| COMPUTE_VIOMMU_MODEL_VIRTIO |
+---------------------------------------+
Confirmation that it is an arm based system:
root@infra-prod-arm-compute-01:/etc/kolla/nova-libvirt# uname -a
Linux infra-prod-arm-compute-01 5.15.0-102-generic #112-Ubuntu SMP Tue Mar 5 16:49:56 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux
On the startup of the nova-compute instance on this compute, I can see the libvirt output shows as much:
2024-04-18 21:47:43.978 7 INFO nova.service [-] Starting compute node (version 28.0.2)
2024-04-18 21:47:44.000 7 INFO nova.virt.node [None req-58e563b8-cf35-4973-be78-d43cab808258 - - - - - -] Determined node identity 57a098bf-31d4-4e4f-9a28-72a925d2384c from /var/lib/nova/compute_id
2024-04-18 21:47:44.021 7 INFO nova.virt.libvirt.driver [None req-58e563b8-cf35-4973-be78-d43cab808258 - - - - - -] Connection event '1' reason 'None'
2024-04-18 21:47:44.460 7 INFO nova.virt.libvirt.host [None req-58e563b8-cf35-4973-be78-d43cab808258 - - - - - -] Libvirt host capabilities <capabilities>
<host>
<uuid>38393550-3736-4753-4833-3334564b5842</uuid>
<cpu>
<arch>aarch64</arch>
<model>Neoverse-N1</model>
<vendor>ARM</vendor>
<topology sockets='1' dies='1' cores='128' threads='1'/>
nova.conf for the nova-compute service on that node:
[DEFAULT]
debug = False
log_dir = /var/log/kolla/nova
state_path = /var/lib/nova
allow_resize_to_same_host = true
compute_driver = libvirt.LibvirtDriver
my_ip = <ip>
transport_url = rabbit://<url>
default_schedule_zone = nova
[conductor]
workers = 5
[vnc]
novncproxy_host = <ip>
novncproxy_port = 6080
server_listen = <ip>
server_proxyclient_address = <ip>
novncproxy_base_url = https://example.com:6080/vnc_lite.html
[serial_console]
enabled = true
base_url = wss://example.com:6083/
serialproxy_host = <ip>
serialproxy_port = 6083
proxyclient_address = <ip>
[oslo_concurrency]
lock_path = /var/lib/nova/tmp
[glance]
debug = False
api_servers = http://<ip>:9292
cafile =
num_retries = 3
[cinder]
catalog_info = volumev3:cinderv3:internalURL
os_region_name = RegionOne
auth_url = http://<ip>:5000
auth_type = password
project_domain_name = Default
user_domain_id = default
project_name = service
username = cinder
password = <pw>
cafile =
[neutron]
metadata_proxy_shared_secret = <secret>
service_metadata_proxy = true
auth_url = http://<ip>:5000
auth_type = password
cafile =
project_domain_name = Default
user_domain_id = default
project_name = service
username = neutron
password = <pw>
region_name = Westford
valid_interfaces = RegionOne
[libvirt]
connection_uri = qemu+tcp://<ip>/system
live_migration_inbound_addr = <ip>
images_type = rbd
images_rbd_pool = vms
images_rbd_ceph_conf = /etc/ceph/ceph.conf
rbd_user = cinder
disk_cachemodes = network=writeback
hw_disk_discard = unmap
rbd_secret_uuid = 48d56060-bcf0-4f94-bee8-83ab18eaabbd
virt_type = kvm
cpu_mode = host-passthrough
num_pcie_ports = 16
[workarounds]
skip_cpu_compare_on_dest = True
[upgrade_levels]
compute = auto
[oslo_messaging_notifications]
transport_url = rabbit://<url>
driver = messagingv2
topics = notifications_designate
[oslo_messaging_rabbit]
heartbeat_in_pthread = false
amqp_durable_queues = true
[privsep_entrypoint]
helper_command = sudo nova-rootwrap /etc/nova/rootwrap.conf privsep-helper --config-file /etc/nova/nova.conf
[guestfs]
debug = False
[placement]
auth_type = password
auth_url = http://<ip>:5000
username = placement
password = <pw>
user_domain_name = Default
project_name = service
project_domain_name = Default
region_name = RegionOne
cafile =
valid_interfaces = internal
[notifications]
notify_on_state_change = vm_and_task_state
[barbican]
auth_endpoint = http://<ip>:5000
barbican_endpoint_type = internal
verify_ssl_path =
[service_user]
send_service_user_token = true
auth_url = http://<ip>:5000
auth_type = password
project_domain_id = default
user_domain_id = default
project_name = service
username = nova
password = <pw>
cafile =
region_name = RegionOne
valid_interfaces = internal
[scheduler]
image_metadata_prefilter = True
I have tried to run openstack resource provider trait delete 57a098bf-31d4-4e4f-9a28-72a925d2384c to delete all traits, then restarted the nova_compute on this compute node, however the same traits come back.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2062425/+subscriptions
References