← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1896463] Re: evacuation failed: Port update failed : Unable to correlate PCI slot

 

just adding the previous filed downstream redhat bug
https://bugzilla.redhat.com/show_bug.cgi?id=1852110

this can happen in queens for context so when we root cause the issue
and fix it it should like be backported to queens. tjere are other older
bugs form newton that look similar related to unshelve so its posible
that the same issue is affecting multiple move operations.

** Bug watch added: Red Hat Bugzilla #1852110
   https://bugzilla.redhat.com/show_bug.cgi?id=1852110

** Also affects: nova/train
   Importance: Undecided
       Status: New

** Also affects: nova/stein
   Importance: Undecided
       Status: New

** Also affects: nova/ussuri
   Importance: Undecided
       Status: New

** Also affects: nova/queens
   Importance: Undecided
       Status: New

** Also affects: nova/victoria
   Importance: Low
     Assignee: Balazs Gibizer (balazs-gibizer)
       Status: Confirmed

** Also affects: nova/rocky
   Importance: Undecided
       Status: New

** Changed in: nova/ussuri
   Importance: Undecided => Low

** Changed in: nova/ussuri
       Status: New => Triaged

** Changed in: nova/train
   Importance: Undecided => Low

** Changed in: nova/train
       Status: New => Triaged

** Changed in: nova/stein
   Importance: Undecided => Low

** Changed in: nova/stein
       Status: New => Triaged

** Changed in: nova/rocky
   Importance: Undecided => Low

** Changed in: nova/rocky
       Status: New => Triaged

** Changed in: nova/queens
   Importance: Undecided => Low

** Changed in: nova/queens
       Status: New => Triaged

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1896463

Title:
  evacuation failed: Port update failed : Unable to correlate PCI slot

Status in OpenStack Compute (nova):
  Confirmed
Status in OpenStack Compute (nova) queens series:
  Triaged
Status in OpenStack Compute (nova) rocky series:
  Triaged
Status in OpenStack Compute (nova) stein series:
  Triaged
Status in OpenStack Compute (nova) train series:
  Triaged
Status in OpenStack Compute (nova) ussuri series:
  Triaged
Status in OpenStack Compute (nova) victoria series:
  Confirmed

Bug description:
  Description
  ===========
  if the _update_available_resource() of resource_tracker is called between _do_rebuild_instance_with_claim() and instance.save() when evacuating VM instances on destination host,  

  nova/compute/manager.py

  2931     def rebuild_instance(self, context, instance, orig_image_ref, image_ref,
  2932 +-- 84 lines: injected_files, new_pass, orig_sys_metadata,-------------------------------------------------------------------
  3016                 claim_ctxt = rebuild_claim(
  3017                     context, instance, scheduled_node,
  3018                     limits=limits, image_meta=image_meta,
  3019                     migration=migration)
  3020                 self._do_rebuild_instance_with_claim(
  3021 +-- 47 lines: claim_ctxt, context, instance, orig_image_ref,-----------------------------------------------------------------
  3068                 instance.apply_migration_context()
  3069                 # NOTE (ndipanov): This save will now update the host and node
  3070                 # attributes making sure that next RT pass is consistent since
  3071                 # it will be based on the instance and not the migration DB
  3072                 # entry.
  3073                 instance.host = self.host
  3074                 instance.node = scheduled_node
  3075                 instance.save()
  3076                 instance.drop_migration_context()

  the instance is not handled as managed instance of the destination
  host because it is not updated on DB yet.

  2020-09-19 07:27:36.321 8 WARNING nova.compute.resource_tracker [req-
  b35d5b9a-0786-4809-bd81-ad306cdda8d5 - - - - -] Instance
  22f6ca0e-f964-4467-83a3-f2bf12bb05ae is not being actively managed by
  this compute host but has allocations referencing this compute host:
  {u'resources': {u'MEMORY_MB': 12288, u'VCPU': 2, u'DISK_GB': 10}}.
  Skipping heal of allocation because we do not know what to do.

  And so the SRIOV ports (PCI device) was free by clean_usage()
  eventhough the VM has the VF port already.

   743     def _update_available_resource(self, context, resources):
   744 +-- 45 lines: # initialize the compute node object, creating it--------------------------------------------------------------
   789         self.pci_tracker.clean_usage(instances, migrations, orphans)
   790         dev_pools_obj = self.pci_tracker.stats.to_device_pools_obj()

  After that, evacuated this VM to another compute host again, we got
  the error like below.


  Steps to reproduce
  ==================
  1. create a VM on com1 with SRIOV VF ports.
  2. stop and disable nova-compute service on com1
  3. wait 60 sec (nova-compute reporting interval)
  4. evauate the VM to com2
  5. wait the VM is active on com2
  6. enable and start nova-compute on com1
  7. wait 60 sec (nova-compute reporting interval)
  8. stop and disable nova-compute service on com2
  9. wait 60 sec (nova-compute reporting interval)
  10. evauate the VM to com1
  11. wait the VM is active on com1
  12. enable and start nova-compute on com2
  13. wait 60 sec (nova-compute reporting interval)
  14. go to step 2.

  Expected result
  ===============
  Evacuation should be done without errors.

  Actual result
  =============
  Evacuation failed with "Port update failed"

  Environment
  ===========
  openstack-nova-compute-18.0.1-1 with SRIOV ports are used. libvirt is used.

  Logs & Configs
  ==============
  2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [req-38dd0be2-7223-4a59-8073-dd1b072125c5 c424fbb3d41f444bb7a025266fda36da 6255a6910b9b4d3ba34a93624fe7fb22 - default default] [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] Setting instance vm_state to ERROR: PortUpdateFailed: Port update failed for port 76dc33dc-5b3b-4c45-b2cb-fd59025a4dbd: Unable to correlate PCI slot 0000:05:12.2
  2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] Traceback (most recent call last):
  2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 7993, in _error_out_instance_on_exception
  2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae]     yield
  2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3025, in rebuild_instance
  2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae]     migration, request_spec)
  2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3087, in _do_rebuild_instance_with_claim
  2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae]     self._do_rebuild_instance(*args, **kwargs)
  2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3190, in _do_rebuild_instance
  2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae]     context, instance, self.host, migration)
  2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae]   File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 2953, in setup_instance_network_on_host
  2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae]     migration)
  2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae]   File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 3058, in _update_port_binding_for_instance
  2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae]     pci_slot)
  2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] PortUpdateFailed: Port update failed for port 76dc33dc-5b3b-4c45-b2cb-fd59025a4dbd: Unable to correlate PCI slot 0000:05:12.2
  2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae]

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1896463/+subscriptions


References