← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1641750] Re: PCI devices are sometime not freed after a migration

 

Reviewed:  https://review.openstack.org/370374
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3a4909ae7e6294e45f09950ebca0b3d7126c80af
Submitter: Jenkins
Branch:    master

commit 3a4909ae7e6294e45f09950ebca0b3d7126c80af
Author: Ludovic Beliveau <ludovic.beliveau@xxxxxxxxxxxxx>
Date:   Wed Sep 14 14:44:46 2016 -0400

    Release PCI devices on drop_move_claim()
    
    On cold migration, drop_move_claim() is called in the confirm stage on the
    source node.  Since the migration is being tracked by the resource tracker on
    the destination node, the source node has the instance in it's
    tracked_instances.
    
    So in this case the PCI devices were only freed on the next periodic audit.
    For PCI resources such as PCI passthrough, those are limited in number and
    should be freed right away.
    
    This patch fixes drop_move_claim() to also free PCI devices when an instance
    is in self.tracked_instances().
    
    Co-Authored-By: Steven Webster <steven.webster@xxxxxxxxxxxxx>
    Change-Id: Ie3392f80dfd2650048519c571ffaa11c025ad048
    Closes-Bug: #1641750


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1641750

Title:
  PCI devices are sometime not freed after a migration

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  Description
  ===========

  During stress testing of cold migration, it has been observed that
  sometimes the PCI devices are not freed by the resource tracker on the
  source node.

  If on the source node the periodic resource audit kicks-in in the
  middle of the migration, the instance uuid is moved from
  tracked_migrations to tracked_instances.  In which case the PCI
  devices won't get freed because the current logic in the code only
  cares about tracked_migration (see
  https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L355).

  Steps to reproduce
  ==================

  1) Boot a guest with a SR-IOV device.
  2) Migrate and confirm the migration
  3) Repeat 2 over and over

  Expected result
  ===============

  In this case the PCI devices will only get freed on the next periodic
  audit.  For PCI resources such as PCI passthrough, those are limited
  in number and should be freed right away.

  Actual result
  =============

  The PCI devices are not freed during the confirm_resize stage.

  Environment
  ===========

  $ git log -1
  commit 633c817de5a67e798d8610d0df1135e5a568fd8a
  Author: Matt Riedemann <mriedem@xxxxxxxxxx>
  Date:   Sat Nov 12 11:59:13 2016 -0500

      api-ref: fix server_id in metadata docs
      
      The api-ref was saying that the server_id was in the body of the
      server metadata requests but it's actually in the path for all
      of the requests.
      
      Change-Id: Icdecd980767f89ee5fcc5bdd4802b2c263268a26
      Closes-Bug: #1641331

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1641750/+subscriptions


References