yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1600304] Re: _update_usage_from_migrations() can end up processing stale migrations

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: OpenStack Infra <1600304@xxxxxxxxxxxxxxxxxx>
Date: Mon, 16 Oct 2017 12:27:11 -0000
Reply-to: Bug 1600304 <1600304@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Reviewed:  https://review.openstack.org/339715
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=0bf9c91bb7a98d0ba8a0565d555560936262e635
Submitter: Zuul
Branch:    master

commit 0bf9c91bb7a98d0ba8a0565d555560936262e635
Author: Chris Friesen <chris.friesen@xxxxxxxxxxxxx>
Date:   Fri Oct 28 03:57:37 2016 -0600

    Filter out stale migrations in resource audit
    
    When doing the resource audit there is a subtle bug in the current
    code.  The problem arises if:
    
    1) You have one or more stale migrations which didn't complete
    properly that involve the current compute node.
    
    2) The instance from the uncompleted migration is currently doing a
    resize/migration that does not involve the current compute node.
    
    When this happens, _update_usage_from_migrations() will be passed in
    the stale migration, and the instance is in fact in a resize state,
    so the current compute node will erroneously account for the instance.
    
    The fix is to check that the instance migration ID matches the ID
    of the migration being analyzed.  This will work because in the case
    of the stale migration we will have hit the error case in
    _pair_instances_to_migrations(), and so the instance will be
    lazy-loaded from the DB, ensuring that its migration ID is up-to-date.
    
    If the IDs don't match, we'll set the migration status to "error" (to
    prevent retrieving that migration the next time) and skip updating
    the usage from the stale migration.
    
    Closes-Bug: #1600304
    Change-Id: I6f5ad01cb1392db3e2b71e322c5be353de9071a2


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1600304

Title:
  _update_usage_from_migrations() can end up processing stale migrations

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  I recently found a bug in Mitaka, and it appears to be still present
  in master.

  I was testing a separate patch by doing resizes, and bugs in my code
  had resulted in a number of incomplete resizes involving compute-1.  I
  then did a resize from compute-0 to compute-0, and saw compute-1's
  resource usage go up when it ran the resource audit.

  This got me curious, so I went digging and discovered a gap in the current resource audit logic.  The problem arises if:
      
      1) You have one or more stale migrations which didn't complete
      properly that involve the current compute node.
      
      2) The instance from the uncompleted migration is currently doing a
      resize/migration that does not involve the current compute node.
      
  When this happens, _update_usage_from_migrations() will be passed in the stale migration, and since the instance is in fact in a resize state, the current compute node will erroneously account for the instance.  (Even though the instance isn't doing anything involving the current compute node.)
      
  The fix is to check that the instance migration ID matches the ID of the migration being analyzed.  This will work because in the case of the stale migration we will have hit the error case in _pair_instances_to_migrations(), and so the instance will be lazy-loaded from the DB, ensuring that its migration ID is up-to-date.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1600304/+subscriptions

References

[Bug 1600304] [NEW] _update_usage_from_migrations() can end up processing stale migrations
From: Chris Friesen, 2016-07-08