yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #52090
[Bug 1590556] [NEW] race condition with resize causing old resources not to be free
Public bug reported:
While I was working on fixing the resize for pci passthrough [1] I have
notice the following issue in resize.
If you are using small image and you resize-confirm it very fast the old
resources are not getting freed.
After debug this issue I found out the root cause of it.
A Good run of resize is as detailed below:
When doing resize the _update_usage_from_migration in the resource
trucker called twice.
1. The first call we return the instance type of the new flavor
and will enter this case
https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L718
2. Then it will put in the tracked_migrations the migration and
the new instance_type
https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L763
3. The second call we return the old instance_type and will enter
this case
https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L725
4. Then in the tracked_migrations it will overwrite the old value
with migration and the old instance type
5.
https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L763
6. When doing resize-confirm the drop_move_claim called with the
old instance type
https://github.com/openstack/nova/blob/9a05d38f48ef0f630c5e49e332075b273cee38b9/nova/compute/manager.py#L3369
7. The drop_move_claim will compare the instance_type[id] from the
tracked_migrations to the instance_type.id (which is the old one)
8. And because they are equals it will remove the old resource
usage
https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L315-L328
But with small image like CirrOS and doing the revert-confirm fast the
second call of _update_usage_from_migration will not get executing.
The result is that when we enter the drop_move_claim it compares it with
the new instance_type and this expression is false
https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L314
This mean that this code block is not executed
https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L315-L326
and therefore old resources are not getting freed.
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1590556
Title:
race condition with resize causing old resources not to be free
Status in OpenStack Compute (nova):
New
Bug description:
While I was working on fixing the resize for pci passthrough [1] I
have notice the following issue in resize.
If you are using small image and you resize-confirm it very fast the
old resources are not getting freed.
After debug this issue I found out the root cause of it.
A Good run of resize is as detailed below:
When doing resize the _update_usage_from_migration in the resource
trucker called twice.
1. The first call we return the instance type of the new flavor
and will enter this case
https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L718
2. Then it will put in the tracked_migrations the migration and
the new instance_type
https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L763
3. The second call we return the old instance_type and will
enter this case
https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L725
4. Then in the tracked_migrations it will overwrite the old
value with migration and the old instance type
5.
https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L763
6. When doing resize-confirm the drop_move_claim called with the
old instance type
https://github.com/openstack/nova/blob/9a05d38f48ef0f630c5e49e332075b273cee38b9/nova/compute/manager.py#L3369
7. The drop_move_claim will compare the instance_type[id] from
the tracked_migrations to the instance_type.id (which is the old one)
8. And because they are equals it will remove the old resource
usage
https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L315-L328
But with small image like CirrOS and doing the revert-confirm fast
the second call of _update_usage_from_migration will not get
executing.
The result is that when we enter the drop_move_claim it compares it
with the new instance_type and this expression is false
https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L314
This mean that this code block is not executed
https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L315-L326
and therefore old resources are not getting freed.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1590556/+subscriptions
Follow ups