← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1590556] [NEW] race condition with resize causing old resources not to be free

 

Public bug reported:

While I was working on fixing the resize for pci passthrough [1] I have
notice the following issue in resize.


If you are using small image and you resize-confirm it very fast the old
resources are not getting freed.


After debug this issue I found out the root cause of it.


A Good run of resize is as detailed below:


When doing resize the _update_usage_from_migration in the resource
trucker called twice.

1.       The first call we return  the instance type of the new flavor
and will enter this case

https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L718

2.       Then it will put in the tracked_migrations the migration and
the new instance_type

https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L763

3.       The second call we return the old  instance_type and will enter
this case

https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L725

4.       Then in the tracked_migrations it will overwrite  the old value
with migration and the old instance type

5.
https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L763

6.       When doing resize-confirm the drop_move_claim called with the
old instance type

https://github.com/openstack/nova/blob/9a05d38f48ef0f630c5e49e332075b273cee38b9/nova/compute/manager.py#L3369

7.       The drop_move_claim will compare the instance_type[id] from the
tracked_migrations to the instance_type.id (which is the old one)

8.       And because they are equals it will  remove the old resource
usage

https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L315-L328


But with small image like CirrOS   and doing the revert-confirm fast the
second call of _update_usage_from_migration will not get executing.

The result is that when we enter the drop_move_claim it compares it with
the new instance_type and this  expression is false
https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L314

This mean that this code block is not executed
https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L315-L326
and therefore old resources are not getting freed.

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1590556

Title:
  race condition with resize causing old resources not to be  free

Status in OpenStack Compute (nova):
  New

Bug description:
  While I was working on fixing the resize for pci passthrough [1] I
  have notice the following issue in resize.


  If you are using small image and you resize-confirm it very fast the
  old resources are not getting freed.


  After debug this issue I found out the root cause of it.


  A Good run of resize is as detailed below:


  When doing resize the _update_usage_from_migration in the resource
  trucker called twice.

  1.       The first call we return  the instance type of the new flavor
  and will enter this case

  https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L718

  2.       Then it will put in the tracked_migrations the migration and
  the new instance_type

  https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L763

  3.       The second call we return the old  instance_type and will
  enter this case

  https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L725

  4.       Then in the tracked_migrations it will overwrite  the old
  value with migration and the old instance type

  5.
  https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L763

  6.       When doing resize-confirm the drop_move_claim called with the
  old instance type

  https://github.com/openstack/nova/blob/9a05d38f48ef0f630c5e49e332075b273cee38b9/nova/compute/manager.py#L3369

  7.       The drop_move_claim will compare the instance_type[id] from
  the tracked_migrations to the instance_type.id (which is the old one)

  8.       And because they are equals it will  remove the old resource
  usage

  https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L315-L328


  But with small image like CirrOS   and doing the revert-confirm fast
  the second call of _update_usage_from_migration will not get
  executing.

  The result is that when we enter the drop_move_claim it compares it
  with the new instance_type and this  expression is false
  https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L314

  This mean that this code block is not executed
  https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L315-L326
  and therefore old resources are not getting freed.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1590556/+subscriptions


Follow ups