yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1454451] [NEW] simultaneous boot of multiple instances leads to cpu pinning overlap

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Chris Friesen <chris.friesen@xxxxxxxxxxxxx>
Date: Tue, 12 May 2015 23:22:51 -0000
Reply-to: Bug 1454451 <1454451@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Public bug reported:

I'm running into an issue with kilo-3 that I think is present in current
trunk.

I think there is a race between the claimed CPUs of an instance being
persisted to the DB, and the resource audit scanning the DB for
instances and subtracting pinned CPUs from the list of available CPUs.

The problem only shows up when the following sequence happens:
1) instance A (with dedicated cpus) boots on a compute node
2) resource audit runs on that compute node
3) instance B (with dedicated cpus) boots on the same compute node

So you need to either be booting many instances, or limiting the valid
compute nodes (host aggregate or server groups), or have a small cluster
in order to hit this.


The nitty-gritty view looks like this:

When booting up an instance we hold the COMPUTE_RESOURCE_SEMAPHORE in
compute.resource_tracker.ResourceTracker.instance_claim() and that
covers updating the resource usage on the compute node. But we don't
persist the instance numa topology to the database until after
instance_claim() returns, in
compute.manager.ComputeManager._build_instance().  Note that this is
done *after* we've given up the semaphore, so there is no longer any
sort of ordering guarantee.

compute.resource_tracker.ResourceTracker.update_available_resource()
then aquires COMPUTE_RESOURCE_SEMAPHORE, queries the database for a list
of instances and uses that to calculate a new view of what resources are
available. If the numa topology of the most recent instance hasn't been
persisted yet, then the new view of resources won't include any pCPUs
pinned by that instance.

compute.manager.ComputeManager._build_instance() runs for the next
instance and based on the new view of available resources it allocates
the same pCPU(s) used by the earlier instance. Boom, overlapping pinned
pCPUs.

** Affects: nova
     Importance: Undecided
         Status: New


** Tags: compute

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1454451

Title:
  simultaneous boot of multiple instances leads to cpu pinning overlap

Status in OpenStack Compute (Nova):
  New

Bug description:
  I'm running into an issue with kilo-3 that I think is present in
  current trunk.

  I think there is a race between the claimed CPUs of an instance being
  persisted to the DB, and the resource audit scanning the DB for
  instances and subtracting pinned CPUs from the list of available CPUs.

  The problem only shows up when the following sequence happens:
  1) instance A (with dedicated cpus) boots on a compute node
  2) resource audit runs on that compute node
  3) instance B (with dedicated cpus) boots on the same compute node

  So you need to either be booting many instances, or limiting the valid
  compute nodes (host aggregate or server groups), or have a small
  cluster in order to hit this.

  
  The nitty-gritty view looks like this:

  When booting up an instance we hold the COMPUTE_RESOURCE_SEMAPHORE in
  compute.resource_tracker.ResourceTracker.instance_claim() and that
  covers updating the resource usage on the compute node. But we don't
  persist the instance numa topology to the database until after
  instance_claim() returns, in
  compute.manager.ComputeManager._build_instance().  Note that this is
  done *after* we've given up the semaphore, so there is no longer any
  sort of ordering guarantee.

  compute.resource_tracker.ResourceTracker.update_available_resource()
  then aquires COMPUTE_RESOURCE_SEMAPHORE, queries the database for a
  list of instances and uses that to calculate a new view of what
  resources are available. If the numa topology of the most recent
  instance hasn't been persisted yet, then the new view of resources
  won't include any pCPUs pinned by that instance.

  compute.manager.ComputeManager._build_instance() runs for the next
  instance and based on the new view of available resources it allocates
  the same pCPU(s) used by the earlier instance. Boom, overlapping
  pinned pCPUs.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1454451/+subscriptions

Follow ups

[Bug 1454451] Re: simultaneous boot of multiple instances leads to cpu pinning overlap
From: Dave Walker, 2016-01-21
[Bug 1454451] Re: simultaneous boot of multiple instances leads to cpu pinning overlap
From: Dave Walker, 2016-01-21
[Bug 1454451] Re: simultaneous boot of multiple instances leads to cpu pinning overlap
From: Chuck Short, 2015-10-13
[Bug 1454451] Re: simultaneous boot of multiple instances leads to cpu pinning overlap
From: Matt Riedemann, 2015-09-21
[Bug 1454451] Re: simultaneous boot of multiple instances leads to cpu pinning overlap
From: Thierry Carrez, 2015-06-24
[Bug 1454451] [NEW] simultaneous boot of multiple instances leads to cpu pinning overlap
From: Chris Friesen, 2015-05-12

References

[Bug 1454451] [NEW] simultaneous boot of multiple instances leads to cpu pinning overlap
From: Chris Friesen, 2015-05-12