yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1454451] Re: simultaneous boot of multiple instances leads to cpu pinning overlap

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Thierry Carrez <thierry.carrez+lp@xxxxxxxxx>
Date: Wed, 24 Jun 2015 12:15:52 -0000
Reply-to: Bug 1454451 <1454451@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

** Changed in: nova
Status: Fix Committed => Fix Released

** Changed in: nova
Milestone: None => liberty-1

--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1454451

Title:
simultaneous boot of multiple instances leads to cpu pinning overlap

Status in OpenStack Compute (Nova):
Fix Released

Bug description:
I'm running into an issue with kilo-3 that I think is present in
current trunk. Basically it results in multiple instances (with
dedicated cpus) being pinned to the same physical cpus.

I think there is a race between the claimed CPUs of an instance being
persisted to the DB, and the resource audit scanning the DB for
instances and subtracting pinned CPUs from the list of available CPUs.

The problem only shows up when the following sequence happens:
1) instance A (with dedicated cpus) boots on a compute node
2) resource audit runs on that compute node
3) instance B (with dedicated cpus) boots on the same compute node

So you need to either be booting many instances, or limiting the valid
compute nodes (host aggregate or server groups), or have a small
cluster in order to hit this.

The nitty-gritty view looks like this:

When booting up an instance we hold the COMPUTE_RESOURCE_SEMAPHORE in
compute.resource_tracker.ResourceTracker.instance_claim() and that
covers updating the resource usage on the compute node. But we don't
persist the instance numa topology to the database until after
instance_claim() returns, in
compute.manager.ComputeManager._build_instance(). Note that this is
done *after* we've given up the semaphore, so there is no longer any
sort of ordering guarantee.

compute.resource_tracker.ResourceTracker.update_available_resource()
then aquires COMPUTE_RESOURCE_SEMAPHORE, queries the database for a
list of instances and uses that to calculate a new view of what
resources are available. If the numa topology of the most recent
instance hasn't been persisted yet, then the new view of resources
won't include any pCPUs pinned by that instance.

compute.manager.ComputeManager._build_instance() runs for the next
instance and based on the new view of available resources it allocates
the same pCPU(s) used by the earlier instance. Boom, overlapping
pinned pCPUs.

Lastly, the same bug applies to the compute.manager.ComputeManager.rebuild_instance() case. It uses the same pattern of doing the claim and then updating the instance numa topology after releasing the semaphore.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1454451/+subscriptions

References

[Bug 1454451] [NEW] simultaneous boot of multiple instances leads to cpu pinning overlap
From: Chris Friesen, 2015-05-12