← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2081853] [NEW] Booting two VMs with anti-affinity in parallel to the same host results in both failing

 

Public bug reported:

The compute manager late anti-affinity policy check rejects both
parallel VM boot requests even though one of them could be accepted to
the host.

To reproduce:
* create server group with anti-affinity policy
* select a single compute and disable the rest of your computes
* boot two VMs in parallel

Expected:
One of the two VMs succeeds to boot the other VM fails with NoValidHost.

Actual:
If you are (un)lucky then both VMs will fail with nova.exception.GroupAffinityViolation

```
❯ journalctl -D sosreport-compute-1-2024-09-17-tzgxrpu/var/log/journal/730eba01f47f493698df59515d1c213a  -u edpm_nova_compute | grep 9d115f6b-bb02-4390-a161-15fb8f83c0cc | grep nova.exception.GroupAffinityViolation:
Sep 17 02:05:36 compute-1 nova_compute[84038]: 2024-09-17 00:05:36.406 2 ERROR nova.compute.manager [None req-a5316266-aca0-4d11-90f9-631e26d058ab 188fff18565b4e46b0c04391ec532b3e b698d1d3bfeb4a75bf32b7a80d19dd46 - - default default] [instance: 9d115f6b-bb02-4390-a161-15fb8f83c0cc] Failed to build and run instance: nova.exception.GroupAffinityViolation: Anti-affinity instance group policy was violated
Sep 17 02:05:36 compute-1 nova_compute[84038]: 2024-09-17 00:05:36.406 2 ERROR nova.compute.manager [instance: 9d115f6b-bb02-4390-a161-15fb8f83c0cc] nova.exception.GroupAffinityViolation: Anti-affinity instance group policy was violated

❯ journalctl -D sosreport-compute-1-2024-09-17-tzgxrpu/var/log/journal/730eba01f47f493698df59515d1c213a  -u edpm_nova_compute | grep ea192e6a-4685-45ae-839b-315dfd36697d | grep nova.exception.GroupAffinityViolation
Sep 17 02:05:36 compute-1 nova_compute[84038]: 2024-09-17 00:05:36.132 2 ERROR nova.compute.manager [None req-b37d5098-75bf-4a3c-a85d-6f2ccdf0104f 188fff18565b4e46b0c04391ec532b3e b698d1d3bfeb4a75bf32b7a80d19dd46 - - default default] [instance: ea192e6a-4685-45ae-839b-315dfd36697d] Failed to build and run instance: nova.exception.GroupAffinityViolation: Anti-affinity instance group policy was violated
Sep 17 02:05:36 compute-1 nova_compute[84038]: 2024-09-17 00:05:36.132 2 ERROR nova.compute.manager [instance: ea192e6a-4685-45ae-839b-315dfd36697d] nova.exception.GroupAffinityViolation: Anti-affinity instance group policy was violated
```

There is a functional reproduce pushed in
https://review.opendev.org/c/openstack/nova/+/930326

** Affects: nova
     Importance: Undecided
         Status: New


** Tags: compute scheduler

** Tags added: compute scheduler

** Description changed:

  The compute manager late anti-affinity policy check rejects both
  parallel VM boot requests even though one of them could be accepted to
  the host.
  
  To reproduce:
  * create server group with anti-affinity policy
  * select a single compute and disable the rest of your computes
  * boot two VMs in parallel
  
  Expected:
  One of the two VMs succeeds to boot the other VM fails with NoValidHost.
  
  Actual:
- If you are (un)lucky then both VM will fail with nova.exception.GroupAffinityViolation
+ If you are (un)lucky then both VMs will fail with nova.exception.GroupAffinityViolation
  
  ```
  ❯ journalctl -D sosreport-compute-1-2024-09-17-tzgxrpu/var/log/journal/730eba01f47f493698df59515d1c213a  -u edpm_nova_compute | grep 9d115f6b-bb02-4390-a161-15fb8f83c0cc | grep nova.exception.GroupAffinityViolation:
  Sep 17 02:05:36 compute-1 nova_compute[84038]: 2024-09-17 00:05:36.406 2 ERROR nova.compute.manager [None req-a5316266-aca0-4d11-90f9-631e26d058ab 188fff18565b4e46b0c04391ec532b3e b698d1d3bfeb4a75bf32b7a80d19dd46 - - default default] [instance: 9d115f6b-bb02-4390-a161-15fb8f83c0cc] Failed to build and run instance: nova.exception.GroupAffinityViolation: Anti-affinity instance group policy was violated
  Sep 17 02:05:36 compute-1 nova_compute[84038]: 2024-09-17 00:05:36.406 2 ERROR nova.compute.manager [instance: 9d115f6b-bb02-4390-a161-15fb8f83c0cc] nova.exception.GroupAffinityViolation: Anti-affinity instance group policy was violated
  
  ❯ journalctl -D sosreport-compute-1-2024-09-17-tzgxrpu/var/log/journal/730eba01f47f493698df59515d1c213a  -u edpm_nova_compute | grep ea192e6a-4685-45ae-839b-315dfd36697d | grep nova.exception.GroupAffinityViolation
  Sep 17 02:05:36 compute-1 nova_compute[84038]: 2024-09-17 00:05:36.132 2 ERROR nova.compute.manager [None req-b37d5098-75bf-4a3c-a85d-6f2ccdf0104f 188fff18565b4e46b0c04391ec532b3e b698d1d3bfeb4a75bf32b7a80d19dd46 - - default default] [instance: ea192e6a-4685-45ae-839b-315dfd36697d] Failed to build and run instance: nova.exception.GroupAffinityViolation: Anti-affinity instance group policy was violated
  Sep 17 02:05:36 compute-1 nova_compute[84038]: 2024-09-17 00:05:36.132 2 ERROR nova.compute.manager [instance: ea192e6a-4685-45ae-839b-315dfd36697d] nova.exception.GroupAffinityViolation: Anti-affinity instance group policy was violated
  ```
  
  There is a functional reproduce pushed in
  https://review.opendev.org/c/openstack/nova/+/930326

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2081853

Title:
  Booting two VMs with anti-affinity in parallel to the same host
  results in both failing

Status in OpenStack Compute (nova):
  New

Bug description:
  The compute manager late anti-affinity policy check rejects both
  parallel VM boot requests even though one of them could be accepted to
  the host.

  To reproduce:
  * create server group with anti-affinity policy
  * select a single compute and disable the rest of your computes
  * boot two VMs in parallel

  Expected:
  One of the two VMs succeeds to boot the other VM fails with NoValidHost.

  Actual:
  If you are (un)lucky then both VMs will fail with nova.exception.GroupAffinityViolation

  ```
  ❯ journalctl -D sosreport-compute-1-2024-09-17-tzgxrpu/var/log/journal/730eba01f47f493698df59515d1c213a  -u edpm_nova_compute | grep 9d115f6b-bb02-4390-a161-15fb8f83c0cc | grep nova.exception.GroupAffinityViolation:
  Sep 17 02:05:36 compute-1 nova_compute[84038]: 2024-09-17 00:05:36.406 2 ERROR nova.compute.manager [None req-a5316266-aca0-4d11-90f9-631e26d058ab 188fff18565b4e46b0c04391ec532b3e b698d1d3bfeb4a75bf32b7a80d19dd46 - - default default] [instance: 9d115f6b-bb02-4390-a161-15fb8f83c0cc] Failed to build and run instance: nova.exception.GroupAffinityViolation: Anti-affinity instance group policy was violated
  Sep 17 02:05:36 compute-1 nova_compute[84038]: 2024-09-17 00:05:36.406 2 ERROR nova.compute.manager [instance: 9d115f6b-bb02-4390-a161-15fb8f83c0cc] nova.exception.GroupAffinityViolation: Anti-affinity instance group policy was violated

  ❯ journalctl -D sosreport-compute-1-2024-09-17-tzgxrpu/var/log/journal/730eba01f47f493698df59515d1c213a  -u edpm_nova_compute | grep ea192e6a-4685-45ae-839b-315dfd36697d | grep nova.exception.GroupAffinityViolation
  Sep 17 02:05:36 compute-1 nova_compute[84038]: 2024-09-17 00:05:36.132 2 ERROR nova.compute.manager [None req-b37d5098-75bf-4a3c-a85d-6f2ccdf0104f 188fff18565b4e46b0c04391ec532b3e b698d1d3bfeb4a75bf32b7a80d19dd46 - - default default] [instance: ea192e6a-4685-45ae-839b-315dfd36697d] Failed to build and run instance: nova.exception.GroupAffinityViolation: Anti-affinity instance group policy was violated
  Sep 17 02:05:36 compute-1 nova_compute[84038]: 2024-09-17 00:05:36.132 2 ERROR nova.compute.manager [instance: ea192e6a-4685-45ae-839b-315dfd36697d] nova.exception.GroupAffinityViolation: Anti-affinity instance group policy was violated
  ```

  There is a functional reproduce pushed in
  https://review.opendev.org/c/openstack/nova/+/930326

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2081853/+subscriptions