yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #96203
[Bug 2117697] [NEW] Race condition between resource tracker and server create
Public bug reported:
Description:
The resource tracker update_available_resource is removing an instance
from the provider_tree if a call to placement aggregates end's in 409.
https://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L1221
This is different to the catch logic in the other calls to traits and inventories for example here where we do not perform any cache removals:
https://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L997-L1008
The race happens when a instance is created on the same provider via
https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2631
https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L240
When a 409 occurs and the instance is removed from the cache we see
unintended behaviour below.
Nova attempts to delete the resource provider as its no longer in the cache
https://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L1485
and we have also seen API logs of Nova also attempting and sometimes succeeding to set aggregates to []. However this is harder to pin down.
https://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L1504
https://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L1183
This bug is filed in conjunction with <> which provides a similar
improvement to prevent the resource tracker trying to update the
provider if it doesn't need too.
Steps to reproduce:
Are quite difficult given its a race condition. However the steps
involve creating a new server at the same time as the resource tracker
triggering update_available_resource and hitting the server in its loop.
Fix:
Remove the removal code here
https://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L1217-L1224
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2117697
Title:
Race condition between resource tracker and server create
Status in OpenStack Compute (nova):
New
Bug description:
Description:
The resource tracker update_available_resource is removing an instance
from the provider_tree if a call to placement aggregates end's in 409.
https://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L1221
This is different to the catch logic in the other calls to traits and inventories for example here where we do not perform any cache removals:
https://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L997-L1008
The race happens when a instance is created on the same provider via
https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2631
https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L240
When a 409 occurs and the instance is removed from the cache we see
unintended behaviour below.
Nova attempts to delete the resource provider as its no longer in the cache
https://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L1485
and we have also seen API logs of Nova also attempting and sometimes succeeding to set aggregates to []. However this is harder to pin down.
https://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L1504
https://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L1183
This bug is filed in conjunction with <> which provides a similar
improvement to prevent the resource tracker trying to update the
provider if it doesn't need too.
Steps to reproduce:
Are quite difficult given its a race condition. However the steps
involve creating a new server at the same time as the resource tracker
triggering update_available_resource and hitting the server in its
loop.
Fix:
Remove the removal code here
https://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L1217-L1224
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2117697/+subscriptions