← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2117697] [NEW] Race condition between resource tracker and server create

 

Public bug reported:

Description:

The resource tracker update_available_resource is removing an instance
from the provider_tree if a call to placement aggregates end's in 409.

https://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L1221

This is different to the catch logic in the other calls to traits and inventories for example here where we do not perform any cache removals:
https://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L997-L1008

The race happens when a instance is created on the same provider via 
https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2631
https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L240

When a 409 occurs and the instance is removed from the cache we see
unintended behaviour below.

Nova attempts to delete the resource provider as its no longer in the cache
https://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L1485
and we have also seen API logs of Nova also attempting and sometimes succeeding to set aggregates to []. However this is harder to pin down. 

https://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L1504
https://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L1183

This bug is filed in conjunction with <> which provides a similar
improvement to prevent the resource tracker trying to update the
provider if it doesn't need too.

Steps to reproduce:

Are quite difficult given its a race condition. However the steps
involve creating a new server at the same time as the resource tracker
triggering update_available_resource and hitting the server in its loop.

Fix:

Remove the removal code here
https://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L1217-L1224

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2117697

Title:
  Race condition between resource tracker and server create

Status in OpenStack Compute (nova):
  New

Bug description:
  Description:

  The resource tracker update_available_resource is removing an instance
  from the provider_tree if a call to placement aggregates end's in 409.

  https://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L1221

  This is different to the catch logic in the other calls to traits and inventories for example here where we do not perform any cache removals:
  https://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L997-L1008

  The race happens when a instance is created on the same provider via 
  https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2631
  https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L240

  When a 409 occurs and the instance is removed from the cache we see
  unintended behaviour below.

  Nova attempts to delete the resource provider as its no longer in the cache
  https://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L1485
  and we have also seen API logs of Nova also attempting and sometimes succeeding to set aggregates to []. However this is harder to pin down. 

  https://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L1504
  https://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L1183

  This bug is filed in conjunction with <> which provides a similar
  improvement to prevent the resource tracker trying to update the
  provider if it doesn't need too.

  Steps to reproduce:

  Are quite difficult given its a race condition. However the steps
  involve creating a new server at the same time as the resource tracker
  triggering update_available_resource and hitting the server in its
  loop.

  Fix:

  Remove the removal code here
  https://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L1217-L1224

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2117697/+subscriptions