yahoo-eng-team team mailing list archive

Thread
Date
[Bug 2023693] [NEW] Tempest failure due to possible affinity group race and cpu pinning

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Frank Ritchie <2023693@xxxxxxxxxxxxxxxxxx>
Date: Tue, 13 Jun 2023 20:39:54 -0000
Reply-to: Bug 2023693 <2023693@xxxxxxxxxxxxxxxxxx>
Sender: noreply@xxxxxxxxxxxxx
Public bug reported:

Description
===========

The temptest test:

test_create_server_with_scheduler_hint_group_affinity

fails in Openstack Yoga but passes with Openstack Victoria.

The test is run on the same hardware with the same configuration.

-----

Relevant info:

1: cpu pinning is enabled via vcpu_pin_set in nova.conf
2: the property hw:cpu_policy=dedicated is set in the flavor

This configuration has literally been working for years.

There seems to be a race type situation where both claims are made
before the cpu free list is updated.

-----

Relevant logs:

CPU 64 in the list of usable CPUs

2023-06-09 21:26:01.223 858862 INFO nova.virt.hardware [-] Computed NUMA
topology CPU pinning: usable pCPUs: [[64, 20], [8, 52], [40, 84], [82,
38], [32, 76], [18, 62], [74, 30], [56, 12], [10, 54], [24, 68], [80,
36], [42, 86], [66, 22], [72, 28], [34, 78], [58, 14], [16, 60], [26,
70]], vCPUs mapping: [(0, 64)]

The first claim is made:

2023-06-09 21:26:01.223 858862 INFO nova.compute.claims [-] [instance:
ecc5bf99-9583-4acd-b075-19535e380c67] Claim successful on node
foo.example.com

CPU 64 is still available:

2023-06-09 21:26:01.261 858862 INFO nova.virt.hardware [-] Computed NUMA
topology CPU pinning: usable pCPUs: [[64, 20], [8, 52], [40, 84], [82,
38], [32, 76], [18, 62], [74, 30], [56, 12], [10, 54], [24, 68], [80,
36], [42, 86], [66, 22], [72, 28], [34, 78], [58, 14], [16, 60], [26,
70]], vCPUs mapping: [(0, 64)]

The second claim is made:

2023-06-09 21:26:01.262 858862 INFO nova.compute.claims [-] [instance:
f65fe4dd-5733-4a9d-be71-32f79e514906] Claim successful on node
foo.example.com

The error is now seen:

2023-06-09 21:26:01.351 858862 ERROR nova.compute.manager [-] [instance:
f65fe4dd-5733-4a9d-be71-32f79e514906] Failed to build and run instance:
nova.exception.CPUPinningInvalid: CPU set to pin [64] must be a subset
of free CPU set [8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34,
36, 38, 40, 42, 52, 54, 56, 58, 60, 62, 66, 68, 70, 72, 74, 76, 78, 80,
82, 84, 86]

Additional error:

ERROR state.: nova.exception.MaxRetriesExceeded: Exceeded maximum number
of retries. Exhausted all hosts available for retrying build failures
for instance...

Steps to reproduce
==================

Enable CPU pinning with Openstack Nova and run the tempest test:

test_create_server_with_scheduler_hint_group_affinity

It fails every time for me.

Expected result
===============

Test passes

Actual result
=============

Test fails

Environment
===========

Nova version: 25.0.2

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2023693

Title:
  Tempest failure due to possible affinity group race and cpu pinning

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===========

  The temptest test:

  test_create_server_with_scheduler_hint_group_affinity

  fails in Openstack Yoga but passes with Openstack Victoria.

  The test is run on the same hardware with the same configuration.

  -----

  Relevant info:

  1: cpu pinning is enabled via vcpu_pin_set in nova.conf
  2: the property hw:cpu_policy=dedicated is set in the flavor

  This configuration has literally been working for years.

  There seems to be a race type situation where both claims are made
  before the cpu free list is updated.

  -----

  Relevant logs:

  CPU 64 in the list of usable CPUs

  2023-06-09 21:26:01.223 858862 INFO nova.virt.hardware [-] Computed
  NUMA topology CPU pinning: usable pCPUs: [[64, 20], [8, 52], [40, 84],
  [82, 38], [32, 76], [18, 62], [74, 30], [56, 12], [10, 54], [24, 68],
  [80, 36], [42, 86], [66, 22], [72, 28], [34, 78], [58, 14], [16, 60],
  [26, 70]], vCPUs mapping: [(0, 64)]

  The first claim is made:

  2023-06-09 21:26:01.223 858862 INFO nova.compute.claims [-] [instance:
  ecc5bf99-9583-4acd-b075-19535e380c67] Claim successful on node
  foo.example.com

  CPU 64 is still available:

  2023-06-09 21:26:01.261 858862 INFO nova.virt.hardware [-] Computed
  NUMA topology CPU pinning: usable pCPUs: [[64, 20], [8, 52], [40, 84],
  [82, 38], [32, 76], [18, 62], [74, 30], [56, 12], [10, 54], [24, 68],
  [80, 36], [42, 86], [66, 22], [72, 28], [34, 78], [58, 14], [16, 60],
  [26, 70]], vCPUs mapping: [(0, 64)]

  The second claim is made:

  2023-06-09 21:26:01.262 858862 INFO nova.compute.claims [-] [instance:
  f65fe4dd-5733-4a9d-be71-32f79e514906] Claim successful on node
  foo.example.com

  The error is now seen:

  2023-06-09 21:26:01.351 858862 ERROR nova.compute.manager [-]
  [instance: f65fe4dd-5733-4a9d-be71-32f79e514906] Failed to build and
  run instance: nova.exception.CPUPinningInvalid: CPU set to pin [64]
  must be a subset of free CPU set [8, 10, 12, 14, 16, 18, 20, 22, 24,
  26, 28, 30, 32, 34, 36, 38, 40, 42, 52, 54, 56, 58, 60, 62, 66, 68,
  70, 72, 74, 76, 78, 80, 82, 84, 86]

  Additional error:

  ERROR state.: nova.exception.MaxRetriesExceeded: Exceeded maximum
  number of retries. Exhausted all hosts available for retrying build
  failures for instance...

  Steps to reproduce
  ==================

  Enable CPU pinning with Openstack Nova and run the tempest test:

  test_create_server_with_scheduler_hint_group_affinity

  It fails every time for me.

  Expected result
  ===============

  Test passes

  Actual result
  =============

  Test fails

  Environment
  ===========

  Nova version: 25.0.2

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2023693/+subscriptions