yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #91877
[Bug 1994526] Re: Asymmetric multi NUMA guest with cpu_policy: mixed fails to schedule
Reviewed: https://review.opendev.org/c/openstack/nova/+/862687
Committed: https://opendev.org/openstack/nova/commit/cffe3971ce585a1ddc374a3ed067347857338831
Submitter: "Zuul (22348)"
Branch: master
commit cffe3971ce585a1ddc374a3ed067347857338831
Author: Balazs Gibizer <gibi@xxxxxxxxxx>
Date: Wed Oct 26 13:28:47 2022 +0200
Handle zero pinned CPU in a cell with mixed policy
When cpu_policy is mixed the scheduler tries to find a valid CPU pinning
for each instance NUMA cell. However if there is an instance NUMA cell
that does not request any pinned CPUs then such logic will calculate
empty pinning information for that cell. Then the scheduler logic
wrongly assumes that an empty pinning result means there was no valid
pinning. However there is difference between a None result when no valid
pinning found, from an empty result [] which means there was nothing to
pin.
This patch makes sure that pinning == None is differentiated from
pinning == [].
Closes-Bug: #1994526
Change-Id: I5a35a45abfcfbbb858a94927853777f112e73e5b
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1994526
Title:
Asymmetric multi NUMA guest with cpu_policy: mixed fails to schedule
Status in OpenStack Compute (nova):
Fix Released
Bug description:
Based on https://bugzilla.redhat.com/show_bug.cgi?id=2135439
Compute Host NUMA Topology:
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
Hosts Dedicated/Shared Configurations:
[tripleo-admin@computesriov-1 ~]$ sudo crudini --get /var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf compute cpu_dedicated_set
24-39
[tripleo-admin@computesriov-1 ~]$ sudo crudini --get /var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf compute cpu_shared_set
20-23
[tripleo-admin@computesriov-0 ~]$ sudo crudini --get /var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf compute cpu_dedicated_set
4-19
[tripleo-admin@computesriov-0 ~]$ sudo crudini --get /var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf compute cpu_shared_set
0,1,2,3
######################################################################
# Failed deployment (3 vCPU, asymmetric, and mixed dedicated policy) #
######################################################################
(overcloud) [stack@undercloud-0 ~]$ openstack flavor show tempest-MixedCPUPolicyTestMultiNuma-flavor-1583852539
/usr/lib/python3.9/site-packages/openstack/config/cloud_region.py:452: UserWarning: You have a configured API_VERSION with 'latest' in it. In the context of openstacksdk this doesn't make any sense.
warnings.warn(
+----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
| OS-FLV-DISABLED:disabled | False |
| OS-FLV-EXT-DATA:ephemeral | 0 |
| access_project_ids | None |
| description | None |
| disk | 1 |
| id | 1495385529 |
| name | tempest-MixedCPUPolicyTestMultiNuma-flavor-1583852539 |
| os-flavor-access:is_public | True |
| properties | hw:cpu_dedicated_mask='^0', hw:cpu_policy='mixed', hw:numa_cpus.0='0', hw:numa_cpus.1='1,2', hw:numa_mem.0='256', hw:numa_mem.1='768', hw:numa_nodes='2' |
| ram | 1024 |
| rxtx_factor | 1.0 |
| swap | |
| vcpus | 3 |
+----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
nova-scheduler.log:2022-10-17 16:25:17.163 12 DEBUG nova.virt.hardware [req-6307d41d-02a6-40f7-88d8-c7fcc2714202 0bfbbc8dfa1548819b0af96786b1ad43 12a4c6a7df9947a39f54dedf04b22fea - default default] Attempting to fit instance cell InstanceNUMACell(cpu_pinning_raw=None,cpu_policy='mixed',cpu_thread_policy=None,cpu_topology=<?>,cpuset=set([0]),cpuset_reserved=None,id=0,memory=256,pagesize=None,pcpuset=set([])) on host_cell NUMACell(cpu_usage=0,cpuset=set([20,22]),id=0,memory=128210,memory_usage=0,mempages=[NUMAPagesTopology,NUMAPagesTopology,NUMAPagesTopology],network_metadata=NetworkMetadata,pcpuset=set([32,34,36,38,24,26,28,30]),pinned_cpus=set([]),siblings=[set([36]),set([24]),set([20]),set([30]),set([38]),set([22]),set([26]),set([28]),set([32]),set([34])],socket=0) _numa_fit_instance_cell /usr/lib/python3.9/site-packages/nova/virt/hardware.py:929
nova-scheduler.log:2022-10-17 16:25:17.163 12 DEBUG nova.virt.hardware [req-6307d41d-02a6-40f7-88d8-c7fcc2714202 0bfbbc8dfa1548819b0af96786b1ad43 12a4c6a7df9947a39f54dedf04b22fea - default default] No specific pagesize requested for instance, selected pagesize: 4 _numa_fit_instance_cell /usr/lib/python3.9/site-packages/nova/virt/hardware.py:956
nova-scheduler.log:2022-10-17 16:25:17.163 12 DEBUG nova.virt.hardware [req-6307d41d-02a6-40f7-88d8-c7fcc2714202 0bfbbc8dfa1548819b0af96786b1ad43 12a4c6a7df9947a39f54dedf04b22fea - default default] Instance has requested pinned CPUs _numa_fit_instance_cell /usr/lib/python3.9/site-packages/nova/virt/hardware.py:1021
nova-scheduler.log:2022-10-17 16:25:17.164 12 DEBUG nova.virt.hardware [req-6307d41d-02a6-40f7-88d8-c7fcc2714202 0bfbbc8dfa1548819b0af96786b1ad43 12a4c6a7df9947a39f54dedf04b22fea - default default] Packing an instance onto a set of siblings: host_cell_free_siblings: [{36}, {24}, set(), {30}, {38}, set(), {26}, {28}, {32}, {34}] instance_cell: InstanceNUMACell(cpu_pinning_raw=None,cpu_policy='mixed',cpu_thread_policy=None,cpu_topology=<?>,cpuset=set([0]),cpuset_reserved=None,id=0,memory=256,pagesize=None,pcpuset=set([])) host_cell_id: 0 threads_per_core: 1 num_cpu_reserved: 0 _pack_instance_onto_cores /usr/lib/python3.9/site-packages/nova/virt/hardware.py:658
nova-scheduler.log:2022-10-17 16:25:17.164 12 DEBUG nova.virt.hardware [req-6307d41d-02a6-40f7-88d8-c7fcc2714202 0bfbbc8dfa1548819b0af96786b1ad43 12a4c6a7df9947a39f54dedf04b22fea - default default] Built sibling_sets: defaultdict(<class 'list'>, {1: [{36}, {24}, {30}, {38}, {26}, {28}, {32}, {34}]}) _pack_instance_onto_cores /usr/lib/python3.9/site-packages/nova/virt/hardware.py:679
nova-scheduler.log:2022-10-17 16:25:17.164 12 DEBUG nova.virt.hardware [req-6307d41d-02a6-40f7-88d8-c7fcc2714202 0bfbbc8dfa1548819b0af96786b1ad43 12a4c6a7df9947a39f54dedf04b22fea - default default] User did not specify a thread policy. Using default for 1 cores _pack_instance_onto_cores /usr/lib/python3.9/site-packages/nova/virt/hardware.py:794
nova-scheduler.log:2022-10-17 16:25:17.164 12 INFO nova.virt.hardware [req-6307d41d-02a6-40f7-88d8-c7fcc2714202 0bfbbc8dfa1548819b0af96786b1ad43 12a4c6a7df9947a39f54dedf04b22fea - default default] Computed NUMA topology CPU pinning: usable pCPUs: [[36], [24], [30], [38], [26], [28], [32], [34]], vCPUs mapping: []
nova-scheduler.log:2022-10-17 16:25:17.165 12 INFO nova.virt.hardware [req-6307d41d-02a6-40f7-88d8-c7fcc2714202 0bfbbc8dfa1548819b0af96786b1ad43 12a4c6a7df9947a39f54dedf04b22fea - default default] Computed NUMA topology CPU pinning: usable pCPUs: [[36], [24], [30], [38], [26], [28], [32], [34]], vCPUs mapping: []
nova-scheduler.log:2022-10-17 16:25:17.165 12 DEBUG nova.virt.hardware [req-6307d41d-02a6-40f7-88d8-c7fcc2714202 0bfbbc8dfa1548819b0af96786b1ad43 12a4c6a7df9947a39f54dedf04b22fea - default default] Failed to map instance cell CPUs to host cell CPUs _numa_fit_instance_cell /usr/lib/python3.9/site-packages/nova/virt/hardware.py:1049
nova-scheduler.log:2022-10-17 16:25:17.165 12 DEBUG nova.scheduler.filters.numa_topology_filter [req-6307d41d-02a6-40f7-88d8-c7fcc2714202 0bfbbc8dfa1548819b0af96786b1ad43 12a4c6a7df9947a39f54dedf04b22fea - default default] [instance: fa4f0399-3507-42e8-a02d-5fd1269b3762] computesriov-1.localdomain, computesriov-1.localdomain fails NUMA topology requirements. The instance does not fit on this host. host_passes /usr/lib/python3.9/site-packages/nova/scheduler/filters/numa_topology_filter.py:106
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1994526/+subscriptions
References