← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2062425] Re: Nova/Placement creating x86 trait for ARM Compute node

 

Reviewed:  https://review.opendev.org/c/openstack/nova/+/926521
Committed: https://opendev.org/openstack/nova/commit/ab18f3763c096d1f4c0da6ad825d670dd5a06b94
Submitter: "Zuul (22348)"
Branch:    master

commit ab18f3763c096d1f4c0da6ad825d670dd5a06b94
Author: Amit Uniyal <auniyal@xxxxxxxxxx>
Date:   Mon Aug 19 07:42:43 2024 +0000

    Libvirt: updates resource provider trait list
    
    This change updates resource provider trait list for hw architecture and
    hw emulation architecture
    
    Closes-Bug: #2062425
    Change-Id: Ia571c5e5e881162d331b638ae2d4a332807d17f5


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2062425

Title:
  Nova/Placement creating x86 trait for ARM Compute node

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  Description
  ===========
  I have a 2023.2 based deployment with both x84 and aarch64 based compute nodes. For the arm node, placement is showing it having an x86 HW trait, causing scheduling of arm architecture images onto it to fail. It also causes it to try and schedule x86 images onto here, which will fail.

  Steps to reproduce
  ==================
  1. I deployed a new 2023.2 deployment with Kolla-ansible. 
  2. Add hw_architecture=aarch64 to a valid glance image
  3. Ensure that image_metadata_prefilter = True in nova.conf on all nova services 
  4. Try and deploy an instance with that image, it will fail with no valid host found
  5. Observe the following in the placement-api logs:

  placement-api.log:41054:2024-04-18 20:39:04.271 21 DEBUG
  placement.requestlog [req-0114c318-5dfd-4588-807b-e591a82ce098 req-
  bd588ea0-5700-4b8e-a43f-0eb15a7275e8 - - - - - -] Starting request:
  10.27.10.33 "GET
  /allocation_candidates?limit=1000&member_of=in%3Aceceb7fb-e0ed-4304-a69f-b327da7ca63f&resources=DISK_GB%3A60%2CMEMORY_MB%3A8192%2CVCPU%3A4&root_required=HW_ARCH_AARCH64%2C%21COMPUTE_STATUS_DISABLED"
  __call__ /var/lib/kolla/venv/lib/python3.10/site-
  packages/placement/requestlog.py:55

  placement-api.log:41055:2024-04-18 20:39:04.317 21 DEBUG
  placement.objects.research_context
  [req-0114c318-5dfd-4588-807b-e591a82ce098 req-
  bd588ea0-5700-4b8e-a43f-0eb15a7275e8 8ce24731fb34492c9354f05050216395
  c48da85ca48f4296b59bacb7b3c2fdfd - - default default] found no
  providers satisfying required traits: {'HW_ARCH_AARCH64'} and
  forbidden traits: {'COMPUTE_STATUS_DISABLED'} _process_anchor_traits
  /var/lib/kolla/venv/lib/python3.10/site-
  packages/placement/objects/research_context.py:243


  Resource providers:
  openstack resource provider list
  +--------------------------------------+---------------------------+------------+--------------------------------------+----------------------+
  | uuid                                 | name                      | generation | root_provider_uuid                   | parent_provider_uuid |
  +--------------------------------------+---------------------------+------------+--------------------------------------+----------------------+
  | a6aa43fb-c819-4dae-b172-b5ed76901591 | infra-prod-compute-04     |          7 | a6aa43fb-c819-4dae-b172-b5ed76901591 | None                 |
  | 2a019b35-25ac-4085-a13d-07802bda6828 | infra-prod-compute-03     |         10 | 2a019b35-25ac-4085-a13d-07802bda6828 | None                 |
  | a008c58b-d16c-4b80-8f58-ca96d1fce2a3 | infra-prod-compute-05     |          7 | a008c58b-d16c-4b80-8f58-ca96d1fce2a3 | None                 |
  | e97340aa-5848-4939-a409-701e5ad52396 | infra-prod-compute-02     |         31 | e97340aa-5848-4939-a409-701e5ad52396 | None                 |
  | 9345e4d0-fc49-4e51-9f38-faeabec1b053 | infra-prod-compute-01     |         18 | 9345e4d0-fc49-4e51-9f38-faeabec1b053 | None                 |
  | 41611dae-3006-4449-9c8b-3369d9b0feb8 | infra-prod-compile-01     |          5 | 41611dae-3006-4449-9c8b-3369d9b0feb8 | None                 |
  | 7fecff4c-9e2d-4d89-a345-91ab4d8c1857 | infra-prod-compile-02     |          5 | 7fecff4c-9e2d-4d89-a345-91ab4d8c1857 | None                 |
  | fbd4030a-1cc9-455a-bca2-2b606fcb3c4d | infra-prod-compile-03     |          5 | fbd4030a-1cc9-455a-bca2-2b606fcb3c4d | None                 |
  | 4d3b29fd-0048-4768-93fa-b7a98f81c125 | infra-prod-compute-06     |          9 | 4d3b29fd-0048-4768-93fa-b7a98f81c125 | None                 |
  | f888bda6-8fb7-4f84-8b87-c9af3b36a6ae | infra-prod-compute-07     |          7 | f888bda6-8fb7-4f84-8b87-c9af3b36a6ae | None                 |
  | 4f53c8d0-bf1d-44d3-89d5-b8f5436ee66a | infra-prod-compile-04     |          5 | 4f53c8d0-bf1d-44d3-89d5-b8f5436ee66a | None                 |
  | 7b6a42c8-b9b4-44a6-9111-2f732c7074e1 | infra-prod-compile-05     |          5 | 7b6a42c8-b9b4-44a6-9111-2f732c7074e1 | None                 |
  | 8312a824-8d88-4646-9eb5-c4937329dab9 | infra-prod-compute-08     |          4 | 8312a824-8d88-4646-9eb5-c4937329dab9 | None                 |
  | 9e60caa5-28ed-4719-aaf5-690b111f17fd | infra-prod-compute-09     |          4 | 9e60caa5-28ed-4719-aaf5-690b111f17fd | None                 |
  | cbfef7fd-b910-4d77-b448-70cdb9638967 | infra-prod-compute-10     |          4 | cbfef7fd-b910-4d77-b448-70cdb9638967 | None                 |
  | d7efda90-b91c-419f-b0be-0f339f37653a | infra-prod-compute-11     |          4 | d7efda90-b91c-419f-b0be-0f339f37653a | None                 |
  | 067f20f4-f513-465e-9e32-e505a97ab165 | infra-prod-compute-12     |          4 | 067f20f4-f513-465e-9e32-e505a97ab165 | None                 |
  | 57a098bf-31d4-4e4f-9a28-72a925d2384c | infra-prod-arm-compute-01 |         12 | 57a098bf-31d4-4e4f-9a28-72a925d2384c | None                 |
  | 632c23d6-63df-4143-9d4c-deb2bdc94c80 | infra-prod-compute-13     |          4 | 632c23d6-63df-4143-9d4c-deb2bdc94c80 | None                 |
  | 0fe3d535-8aec-4307-943e-2c46b01bc019 | infra-prod-compute-14     |          4 | 0fe3d535-8aec-4307-943e-2c46b01bc019 | None                 |
  | 8f60a0e9-2510-48ce-b305-6937314bac4a | infra-prod-compute-15     |          4 | 8f60a0e9-2510-48ce-b305-6937314bac4a | None                 |
  +--------------------------------------+---------------------------+------------+--------------------------------------+----------------------+

  Traits showing for the arm node (notice no HW_ARCH_AARCH64): 
  openstack resource provider trait list 57a098bf-31d4-4e4f-9a28-72a925d2384c
  +---------------------------------------+
  | name                                  |
  +---------------------------------------+
  | COMPUTE_IMAGE_TYPE_QCOW2              |
  | COMPUTE_ADDRESS_SPACE_EMULATED        |
  | COMPUTE_NET_VIF_MODEL_VMXNET3         |
  | COMPUTE_GRAPHICS_MODEL_NONE           |
  | COMPUTE_IMAGE_TYPE_ISO                |
  | COMPUTE_DEVICE_TAGGING                |
  | COMPUTE_NET_VIF_MODEL_NE2K_PCI        |
  | COMPUTE_GRAPHICS_MODEL_VIRTIO         |
  | COMPUTE_RESCUE_BFV                    |
  | COMPUTE_STORAGE_BUS_VIRTIO            |
  | COMPUTE_STORAGE_BUS_SCSI              |
  | COMPUTE_GRAPHICS_MODEL_VGA            |
  | COMPUTE_IMAGE_TYPE_AMI                |
  | COMPUTE_NET_VIF_MODEL_E1000           |
  | COMPUTE_STORAGE_BUS_SATA              |
  | COMPUTE_NET_VIF_MODEL_PCNET           |
  | COMPUTE_NET_ATTACH_INTERFACE          |
  | HW_CPU_X86_AESNI                      |
  | COMPUTE_STORAGE_BUS_USB               |
  | COMPUTE_ADDRESS_SPACE_PASSTHROUGH     |
  | COMPUTE_NET_VIF_MODEL_RTL8139         |
  | COMPUTE_NET_ATTACH_INTERFACE_WITH_TAG |
  | COMPUTE_VOLUME_ATTACH_WITH_TAG        |
  | COMPUTE_TRUSTED_CERTS                 |
  | COMPUTE_IMAGE_TYPE_AKI                |
  | COMPUTE_VIOMMU_MODEL_SMMUV3           |
  | COMPUTE_STORAGE_BUS_FDC               |
  | COMPUTE_VIOMMU_MODEL_AUTO             |
  | COMPUTE_VOLUME_EXTEND                 |
  | COMPUTE_SOCKET_PCI_NUMA_AFFINITY      |
  | COMPUTE_NET_VIF_MODEL_E1000E          |
  | COMPUTE_NODE                          |
  | COMPUTE_ACCELERATORS                  |
  | COMPUTE_IMAGE_TYPE_RAW                |
  | COMPUTE_VOLUME_MULTI_ATTACH           |
  | COMPUTE_IMAGE_TYPE_ARI                |
  | COMPUTE_GRAPHICS_MODEL_BOCHS          |
  | COMPUTE_NET_VIF_MODEL_SPAPR_VLAN      |
  | COMPUTE_GRAPHICS_MODEL_CIRRUS         |
  | COMPUTE_GRAPHICS_MODEL_VMVGA          |
  | COMPUTE_NET_VIF_MODEL_VIRTIO          |
  | COMPUTE_VIOMMU_MODEL_VIRTIO           |
  +---------------------------------------+

  
  Confirmation that it is an arm based system:
  root@infra-prod-arm-compute-01:/etc/kolla/nova-libvirt# uname -a
  Linux infra-prod-arm-compute-01 5.15.0-102-generic #112-Ubuntu SMP Tue Mar 5 16:49:56 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux

  On the startup of the nova-compute instance on this compute, I can see the libvirt output shows as much:
  2024-04-18 21:47:43.978 7 INFO nova.service [-] Starting compute node (version 28.0.2)
  2024-04-18 21:47:44.000 7 INFO nova.virt.node [None req-58e563b8-cf35-4973-be78-d43cab808258 - - - - - -] Determined node identity 57a098bf-31d4-4e4f-9a28-72a925d2384c from /var/lib/nova/compute_id
  2024-04-18 21:47:44.021 7 INFO nova.virt.libvirt.driver [None req-58e563b8-cf35-4973-be78-d43cab808258 - - - - - -] Connection event '1' reason 'None'
  2024-04-18 21:47:44.460 7 INFO nova.virt.libvirt.host [None req-58e563b8-cf35-4973-be78-d43cab808258 - - - - - -] Libvirt host capabilities <capabilities>

    <host>
      <uuid>38393550-3736-4753-4833-3334564b5842</uuid>
      <cpu>
        <arch>aarch64</arch>
        <model>Neoverse-N1</model>
        <vendor>ARM</vendor>
        <topology sockets='1' dies='1' cores='128' threads='1'/>


  nova.conf for the nova-compute service on that node:
  [DEFAULT]
  debug = False
  log_dir = /var/log/kolla/nova
  state_path = /var/lib/nova
  allow_resize_to_same_host = true
  compute_driver = libvirt.LibvirtDriver
  my_ip = <ip>
  transport_url = rabbit://<url>
  default_schedule_zone = nova

  [conductor]
  workers = 5

  [vnc]
  novncproxy_host = <ip>
  novncproxy_port = 6080
  server_listen = <ip>
  server_proxyclient_address = <ip>
  novncproxy_base_url = https://example.com:6080/vnc_lite.html

  [serial_console]
  enabled = true
  base_url = wss://example.com:6083/
  serialproxy_host = <ip>
  serialproxy_port = 6083
  proxyclient_address = <ip>

  [oslo_concurrency]
  lock_path = /var/lib/nova/tmp

  [glance]
  debug = False
  api_servers = http://<ip>:9292
  cafile =
  num_retries = 3

  [cinder]
  catalog_info = volumev3:cinderv3:internalURL
  os_region_name = RegionOne
  auth_url = http://<ip>:5000
  auth_type = password
  project_domain_name = Default
  user_domain_id = default
  project_name = service
  username = cinder
  password = <pw>
  cafile =

  [neutron]
  metadata_proxy_shared_secret = <secret>
  service_metadata_proxy = true
  auth_url = http://<ip>:5000
  auth_type = password
  cafile =
  project_domain_name = Default
  user_domain_id = default
  project_name = service
  username = neutron
  password = <pw>
  region_name = Westford
  valid_interfaces = RegionOne

  [libvirt]
  connection_uri = qemu+tcp://<ip>/system
  live_migration_inbound_addr = <ip>
  images_type = rbd
  images_rbd_pool = vms
  images_rbd_ceph_conf = /etc/ceph/ceph.conf
  rbd_user = cinder
  disk_cachemodes = network=writeback
  hw_disk_discard = unmap
  rbd_secret_uuid = 48d56060-bcf0-4f94-bee8-83ab18eaabbd
  virt_type = kvm
  cpu_mode = host-passthrough
  num_pcie_ports = 16

  [workarounds]
  skip_cpu_compare_on_dest = True

  [upgrade_levels]
  compute = auto

  [oslo_messaging_notifications]
  transport_url = rabbit://<url>
  driver = messagingv2
  topics = notifications_designate

  [oslo_messaging_rabbit]
  heartbeat_in_pthread = false
  amqp_durable_queues = true

  [privsep_entrypoint]
  helper_command = sudo nova-rootwrap /etc/nova/rootwrap.conf privsep-helper --config-file /etc/nova/nova.conf

  [guestfs]
  debug = False

  [placement]
  auth_type = password
  auth_url = http://<ip>:5000
  username = placement
  password = <pw>
  user_domain_name = Default
  project_name = service
  project_domain_name = Default
  region_name = RegionOne
  cafile =
  valid_interfaces = internal

  [notifications]
  notify_on_state_change = vm_and_task_state

  [barbican]
  auth_endpoint = http://<ip>:5000
  barbican_endpoint_type = internal
  verify_ssl_path =

  [service_user]
  send_service_user_token = true
  auth_url = http://<ip>:5000
  auth_type = password
  project_domain_id = default
  user_domain_id = default
  project_name = service
  username = nova
  password = <pw>
  cafile =
  region_name = RegionOne
  valid_interfaces = internal

  [scheduler]
  image_metadata_prefilter = True

  
  I have tried to run openstack resource provider trait delete 57a098bf-31d4-4e4f-9a28-72a925d2384c to delete all traits, then restarted the nova_compute on this compute node, however the same traits come back.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2062425/+subscriptions



References