yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #92498
[Bug 2023693] [NEW] Tempest failure due to possible affinity group race and cpu pinning
Public bug reported:
Description
===========
The temptest test:
test_create_server_with_scheduler_hint_group_affinity
fails in Openstack Yoga but passes with Openstack Victoria.
The test is run on the same hardware with the same configuration.
-----
Relevant info:
1: cpu pinning is enabled via vcpu_pin_set in nova.conf
2: the property hw:cpu_policy=dedicated is set in the flavor
This configuration has literally been working for years.
There seems to be a race type situation where both claims are made
before the cpu free list is updated.
-----
Relevant logs:
CPU 64 in the list of usable CPUs
2023-06-09 21:26:01.223 858862 INFO nova.virt.hardware [-] Computed NUMA
topology CPU pinning: usable pCPUs: [[64, 20], [8, 52], [40, 84], [82,
38], [32, 76], [18, 62], [74, 30], [56, 12], [10, 54], [24, 68], [80,
36], [42, 86], [66, 22], [72, 28], [34, 78], [58, 14], [16, 60], [26,
70]], vCPUs mapping: [(0, 64)]
The first claim is made:
2023-06-09 21:26:01.223 858862 INFO nova.compute.claims [-] [instance:
ecc5bf99-9583-4acd-b075-19535e380c67] Claim successful on node
foo.example.com
CPU 64 is still available:
2023-06-09 21:26:01.261 858862 INFO nova.virt.hardware [-] Computed NUMA
topology CPU pinning: usable pCPUs: [[64, 20], [8, 52], [40, 84], [82,
38], [32, 76], [18, 62], [74, 30], [56, 12], [10, 54], [24, 68], [80,
36], [42, 86], [66, 22], [72, 28], [34, 78], [58, 14], [16, 60], [26,
70]], vCPUs mapping: [(0, 64)]
The second claim is made:
2023-06-09 21:26:01.262 858862 INFO nova.compute.claims [-] [instance:
f65fe4dd-5733-4a9d-be71-32f79e514906] Claim successful on node
foo.example.com
The error is now seen:
2023-06-09 21:26:01.351 858862 ERROR nova.compute.manager [-] [instance:
f65fe4dd-5733-4a9d-be71-32f79e514906] Failed to build and run instance:
nova.exception.CPUPinningInvalid: CPU set to pin [64] must be a subset
of free CPU set [8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34,
36, 38, 40, 42, 52, 54, 56, 58, 60, 62, 66, 68, 70, 72, 74, 76, 78, 80,
82, 84, 86]
Additional error:
ERROR state.: nova.exception.MaxRetriesExceeded: Exceeded maximum number
of retries. Exhausted all hosts available for retrying build failures
for instance...
Steps to reproduce
==================
Enable CPU pinning with Openstack Nova and run the tempest test:
test_create_server_with_scheduler_hint_group_affinity
It fails every time for me.
Expected result
===============
Test passes
Actual result
=============
Test fails
Environment
===========
Nova version: 25.0.2
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2023693
Title:
Tempest failure due to possible affinity group race and cpu pinning
Status in OpenStack Compute (nova):
New
Bug description:
Description
===========
The temptest test:
test_create_server_with_scheduler_hint_group_affinity
fails in Openstack Yoga but passes with Openstack Victoria.
The test is run on the same hardware with the same configuration.
-----
Relevant info:
1: cpu pinning is enabled via vcpu_pin_set in nova.conf
2: the property hw:cpu_policy=dedicated is set in the flavor
This configuration has literally been working for years.
There seems to be a race type situation where both claims are made
before the cpu free list is updated.
-----
Relevant logs:
CPU 64 in the list of usable CPUs
2023-06-09 21:26:01.223 858862 INFO nova.virt.hardware [-] Computed
NUMA topology CPU pinning: usable pCPUs: [[64, 20], [8, 52], [40, 84],
[82, 38], [32, 76], [18, 62], [74, 30], [56, 12], [10, 54], [24, 68],
[80, 36], [42, 86], [66, 22], [72, 28], [34, 78], [58, 14], [16, 60],
[26, 70]], vCPUs mapping: [(0, 64)]
The first claim is made:
2023-06-09 21:26:01.223 858862 INFO nova.compute.claims [-] [instance:
ecc5bf99-9583-4acd-b075-19535e380c67] Claim successful on node
foo.example.com
CPU 64 is still available:
2023-06-09 21:26:01.261 858862 INFO nova.virt.hardware [-] Computed
NUMA topology CPU pinning: usable pCPUs: [[64, 20], [8, 52], [40, 84],
[82, 38], [32, 76], [18, 62], [74, 30], [56, 12], [10, 54], [24, 68],
[80, 36], [42, 86], [66, 22], [72, 28], [34, 78], [58, 14], [16, 60],
[26, 70]], vCPUs mapping: [(0, 64)]
The second claim is made:
2023-06-09 21:26:01.262 858862 INFO nova.compute.claims [-] [instance:
f65fe4dd-5733-4a9d-be71-32f79e514906] Claim successful on node
foo.example.com
The error is now seen:
2023-06-09 21:26:01.351 858862 ERROR nova.compute.manager [-]
[instance: f65fe4dd-5733-4a9d-be71-32f79e514906] Failed to build and
run instance: nova.exception.CPUPinningInvalid: CPU set to pin [64]
must be a subset of free CPU set [8, 10, 12, 14, 16, 18, 20, 22, 24,
26, 28, 30, 32, 34, 36, 38, 40, 42, 52, 54, 56, 58, 60, 62, 66, 68,
70, 72, 74, 76, 78, 80, 82, 84, 86]
Additional error:
ERROR state.: nova.exception.MaxRetriesExceeded: Exceeded maximum
number of retries. Exhausted all hosts available for retrying build
failures for instance...
Steps to reproduce
==================
Enable CPU pinning with Openstack Nova and run the tempest test:
test_create_server_with_scheduler_hint_group_affinity
It fails every time for me.
Expected result
===============
Test passes
Actual result
=============
Test fails
Environment
===========
Nova version: 25.0.2
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2023693/+subscriptions