yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #84012
[Bug 1896463] Re: evacuation failed: Port update failed : Unable to correlate PCI slot
just adding the previous filed downstream redhat bug
https://bugzilla.redhat.com/show_bug.cgi?id=1852110
this can happen in queens for context so when we root cause the issue
and fix it it should like be backported to queens. tjere are other older
bugs form newton that look similar related to unshelve so its posible
that the same issue is affecting multiple move operations.
** Bug watch added: Red Hat Bugzilla #1852110
https://bugzilla.redhat.com/show_bug.cgi?id=1852110
** Also affects: nova/train
Importance: Undecided
Status: New
** Also affects: nova/stein
Importance: Undecided
Status: New
** Also affects: nova/ussuri
Importance: Undecided
Status: New
** Also affects: nova/queens
Importance: Undecided
Status: New
** Also affects: nova/victoria
Importance: Low
Assignee: Balazs Gibizer (balazs-gibizer)
Status: Confirmed
** Also affects: nova/rocky
Importance: Undecided
Status: New
** Changed in: nova/ussuri
Importance: Undecided => Low
** Changed in: nova/ussuri
Status: New => Triaged
** Changed in: nova/train
Importance: Undecided => Low
** Changed in: nova/train
Status: New => Triaged
** Changed in: nova/stein
Importance: Undecided => Low
** Changed in: nova/stein
Status: New => Triaged
** Changed in: nova/rocky
Importance: Undecided => Low
** Changed in: nova/rocky
Status: New => Triaged
** Changed in: nova/queens
Importance: Undecided => Low
** Changed in: nova/queens
Status: New => Triaged
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1896463
Title:
evacuation failed: Port update failed : Unable to correlate PCI slot
Status in OpenStack Compute (nova):
Confirmed
Status in OpenStack Compute (nova) queens series:
Triaged
Status in OpenStack Compute (nova) rocky series:
Triaged
Status in OpenStack Compute (nova) stein series:
Triaged
Status in OpenStack Compute (nova) train series:
Triaged
Status in OpenStack Compute (nova) ussuri series:
Triaged
Status in OpenStack Compute (nova) victoria series:
Confirmed
Bug description:
Description
===========
if the _update_available_resource() of resource_tracker is called between _do_rebuild_instance_with_claim() and instance.save() when evacuating VM instances on destination host,
nova/compute/manager.py
2931 def rebuild_instance(self, context, instance, orig_image_ref, image_ref,
2932 +-- 84 lines: injected_files, new_pass, orig_sys_metadata,-------------------------------------------------------------------
3016 claim_ctxt = rebuild_claim(
3017 context, instance, scheduled_node,
3018 limits=limits, image_meta=image_meta,
3019 migration=migration)
3020 self._do_rebuild_instance_with_claim(
3021 +-- 47 lines: claim_ctxt, context, instance, orig_image_ref,-----------------------------------------------------------------
3068 instance.apply_migration_context()
3069 # NOTE (ndipanov): This save will now update the host and node
3070 # attributes making sure that next RT pass is consistent since
3071 # it will be based on the instance and not the migration DB
3072 # entry.
3073 instance.host = self.host
3074 instance.node = scheduled_node
3075 instance.save()
3076 instance.drop_migration_context()
the instance is not handled as managed instance of the destination
host because it is not updated on DB yet.
2020-09-19 07:27:36.321 8 WARNING nova.compute.resource_tracker [req-
b35d5b9a-0786-4809-bd81-ad306cdda8d5 - - - - -] Instance
22f6ca0e-f964-4467-83a3-f2bf12bb05ae is not being actively managed by
this compute host but has allocations referencing this compute host:
{u'resources': {u'MEMORY_MB': 12288, u'VCPU': 2, u'DISK_GB': 10}}.
Skipping heal of allocation because we do not know what to do.
And so the SRIOV ports (PCI device) was free by clean_usage()
eventhough the VM has the VF port already.
743 def _update_available_resource(self, context, resources):
744 +-- 45 lines: # initialize the compute node object, creating it--------------------------------------------------------------
789 self.pci_tracker.clean_usage(instances, migrations, orphans)
790 dev_pools_obj = self.pci_tracker.stats.to_device_pools_obj()
After that, evacuated this VM to another compute host again, we got
the error like below.
Steps to reproduce
==================
1. create a VM on com1 with SRIOV VF ports.
2. stop and disable nova-compute service on com1
3. wait 60 sec (nova-compute reporting interval)
4. evauate the VM to com2
5. wait the VM is active on com2
6. enable and start nova-compute on com1
7. wait 60 sec (nova-compute reporting interval)
8. stop and disable nova-compute service on com2
9. wait 60 sec (nova-compute reporting interval)
10. evauate the VM to com1
11. wait the VM is active on com1
12. enable and start nova-compute on com2
13. wait 60 sec (nova-compute reporting interval)
14. go to step 2.
Expected result
===============
Evacuation should be done without errors.
Actual result
=============
Evacuation failed with "Port update failed"
Environment
===========
openstack-nova-compute-18.0.1-1 with SRIOV ports are used. libvirt is used.
Logs & Configs
==============
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [req-38dd0be2-7223-4a59-8073-dd1b072125c5 c424fbb3d41f444bb7a025266fda36da 6255a6910b9b4d3ba34a93624fe7fb22 - default default] [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] Setting instance vm_state to ERROR: PortUpdateFailed: Port update failed for port 76dc33dc-5b3b-4c45-b2cb-fd59025a4dbd: Unable to correlate PCI slot 0000:05:12.2
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] Traceback (most recent call last):
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 7993, in _error_out_instance_on_exception
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] yield
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3025, in rebuild_instance
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] migration, request_spec)
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3087, in _do_rebuild_instance_with_claim
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] self._do_rebuild_instance(*args, **kwargs)
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3190, in _do_rebuild_instance
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] context, instance, self.host, migration)
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 2953, in setup_instance_network_on_host
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] migration)
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 3058, in _update_port_binding_for_instance
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] pci_slot)
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] PortUpdateFailed: Port update failed for port 76dc33dc-5b3b-4c45-b2cb-fd59025a4dbd: Unable to correlate PCI slot 0000:05:12.2
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae]
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1896463/+subscriptions
References