← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1996732] [NEW] Late affinity check failre counted as failed build

 

Public bug reported:

The late anti-affinity checks runs in the compute manager to avoid
parallel scheduling requests to invalidate the anti-affinity server
group policy. When the check fails the instance is re-scheduled. However
this failure counted as a real instance boot failure of the compute
host[1][2][3] and can lead to de-prioritization of the compute host in
the scheduler via BuildFailureWeigher[4]. As the late anti-affinity
check is does not indicate any fault of the compute host itself it
should not be counted towards the build failure counter.

[1] https://github.com/openstack/nova/blob/2eb358cdcec36fcfe5388ce6982d2961ca949d0a/nova/compute/manager.py#L2496
[2] https://github.com/openstack/nova/blob/2eb358cdcec36fcfe5388ce6982d2961ca949d0a/nova/compute/manager.py#L1808
[3] https://github.com/openstack/nova/blob/2eb358cdcec36fcfe5388ce6982d2961ca949d0a/nova/compute/manager.py#L2265
[4] https://docs.openstack.org/nova/latest/configuration/config.html#compute.consecutive_build_service_disable_threshold

** Affects: nova
     Importance: Undecided
         Status: New


** Tags: compute scheduler

** Tags added: compute scheduler

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1996732

Title:
  Late affinity check failre counted as failed build

Status in OpenStack Compute (nova):
  New

Bug description:
  The late anti-affinity checks runs in the compute manager to avoid
  parallel scheduling requests to invalidate the anti-affinity server
  group policy. When the check fails the instance is re-scheduled.
  However this failure counted as a real instance boot failure of the
  compute host[1][2][3] and can lead to de-prioritization of the compute
  host in the scheduler via BuildFailureWeigher[4]. As the late anti-
  affinity check is does not indicate any fault of the compute host
  itself it should not be counted towards the build failure counter.

  [1] https://github.com/openstack/nova/blob/2eb358cdcec36fcfe5388ce6982d2961ca949d0a/nova/compute/manager.py#L2496
  [2] https://github.com/openstack/nova/blob/2eb358cdcec36fcfe5388ce6982d2961ca949d0a/nova/compute/manager.py#L1808
  [3] https://github.com/openstack/nova/blob/2eb358cdcec36fcfe5388ce6982d2961ca949d0a/nova/compute/manager.py#L2265
  [4] https://docs.openstack.org/nova/latest/configuration/config.html#compute.consecutive_build_service_disable_threshold

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1996732/+subscriptions



Follow ups