yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #41609
[Bug 1358379] Re: drop_resize_claim() can't release the resource in some small window
[Expired for OpenStack Compute (nova) because there has been no activity
for 60 days.]
** Changed in: nova
Status: Incomplete => Expired
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1358379
Title:
drop_resize_claim() can't release the resource in some small window
Status in OpenStack Compute (nova):
Expired
Bug description:
Currently the resize resource claim is achieved through resize_claim()
and drop_resize_claim() pair. In theory, the claim should be released
after the drop_resize_claim() be called. However, there is a small
window that this release will not happen.
Currently RT tracker resource usage by two category: the instances hosted on the node (the _update_usage_from_instances()) and the migration in/out of the node (the _update_usage_from_migrations()).
A instance hosted in the node is sure to have resource claim, an in/out migration that the instance is not hosted in the node will also have a resource claim. If a resize happens to the same host, then one claim will be tracked in the instance side and another is in the migration side. Such audit happens in the update_vailable_resources() periodic task.
Current drop_resize_claim() implementation always assume the related resource is in the tracked migration, however, this is not true if the drop_resize_claim() happens before the audit periodic task. Considering the audit happens in time t1 and (t1 + 60s) assuming the audit periodic is 60s. And between these two audit time, a instance in this node is resized to another node, and user confirm the resize() too (i.e. this node is the source node).
Because the resize happend between the audit periodic task, the RT has
no idea and no migration tracked. Thus when
drop_resize_claim(prefix='old_') happens, it has no resource claim to
release. The release will happen till next audit cycle, which will
find the host is not hosted in the node.
I'm not sure if this is really a issue. I think a) the result purely
depends on the periodic task lengthy. If the periodic task lengthy is
very long, it will cause resource waste, or in worst situation, a
potential DoS issue. But it should be ok if the periodic task is
short. b)From an implementation point of view,
drop_resize_claim(prefix='old_') return successfully w/o release the
resource is bogus.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1358379/+subscriptions
References