yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #83944
[Bug 1896463] [NEW] evacuation failed: Port update failed : Unable to correlate PCI slot
Public bug reported:
Description
===========
if the _update_available_resource() of resource_tracker is called between _do_rebuild_instance_with_claim() and instance.save() when evacuating VM instances on destination host,
nova/compute/manager.py
2931 def rebuild_instance(self, context, instance, orig_image_ref, image_ref,
2932 +-- 84 lines: injected_files, new_pass, orig_sys_metadata,-------------------------------------------------------------------
3016 claim_ctxt = rebuild_claim(
3017 context, instance, scheduled_node,
3018 limits=limits, image_meta=image_meta,
3019 migration=migration)
3020 self._do_rebuild_instance_with_claim(
3021 +-- 47 lines: claim_ctxt, context, instance, orig_image_ref,-----------------------------------------------------------------
3068 instance.apply_migration_context()
3069 # NOTE (ndipanov): This save will now update the host and node
3070 # attributes making sure that next RT pass is consistent since
3071 # it will be based on the instance and not the migration DB
3072 # entry.
3073 instance.host = self.host
3074 instance.node = scheduled_node
3075 instance.save()
3076 instance.drop_migration_context()
the instance is not handled as managed instance of the destination host
because it is not updated on DB yet.
2020-09-19 07:27:36.321 8 WARNING nova.compute.resource_tracker [req-
b35d5b9a-0786-4809-bd81-ad306cdda8d5 - - - - -] Instance
22f6ca0e-f964-4467-83a3-f2bf12bb05ae is not being actively managed by
this compute host but has allocations referencing this compute host:
{u'resources': {u'MEMORY_MB': 12288, u'VCPU': 2, u'DISK_GB': 10}}.
Skipping heal of allocation because we do not know what to do.
And so the SRIOV ports (PCI device) was free by clean_usage() eventhough
the VM has the VF port already.
743 def _update_available_resource(self, context, resources):
744 +-- 45 lines: # initialize the compute node object, creating it--------------------------------------------------------------
789 self.pci_tracker.clean_usage(instances, migrations, orphans)
790 dev_pools_obj = self.pci_tracker.stats.to_device_pools_obj()
After that, evacuated this VM to another compute host again, we got the
error like below.
Steps to reproduce
==================
1. create a VM on com1 with SRIOV VF ports.
2. stop and disable nova-compute service on com1
3. wait 60 sec (nova-compute reporting interval)
4. evauate the VM to com2
5. wait the VM is active on com2
6. enable and start nova-compute on com1
7. wait 60 sec (nova-compute reporting interval)
8. stop and disable nova-compute service on com2
9. wait 60 sec (nova-compute reporting interval)
10. evauate the VM to com1
11. wait the VM is active on com1
12. enable and start nova-compute on com2
13. wait 60 sec (nova-compute reporting interval)
14. go to step 2.
Expected result
===============
Evacuation should be done without errors.
Actual result
=============
Evacuation failed with "Port update failed"
Environment
===========
openstack-nova-compute-18.0.1-1 with SRIOV ports are used. libvirt is used.
Logs & Configs
==============
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [req-38dd0be2-7223-4a59-8073-dd1b072125c5 c424fbb3d41f444bb7a025266fda36da 6255a6910b9b4d3ba34a93624fe7fb22 - default default] [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] Setting instance vm_state to ERROR: PortUpdateFailed: Port update failed for port 76dc33dc-5b3b-4c45-b2cb-fd59025a4dbd: Unable to correlate PCI slot 0000:05:12.2
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] Traceback (most recent call last):
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 7993, in _error_out_instance_on_exception
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] yield
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3025, in rebuild_instance
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] migration, request_spec)
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3087, in _do_rebuild_instance_with_claim
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] self._do_rebuild_instance(*args, **kwargs)
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3190, in _do_rebuild_instance
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] context, instance, self.host, migration)
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 2953, in setup_instance_network_on_host
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] migration)
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 3058, in _update_port_binding_for_instance
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] pci_slot)
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] PortUpdateFailed: Port update failed for port 76dc33dc-5b3b-4c45-b2cb-fd59025a4dbd: Unable to correlate PCI slot 0000:05:12.2
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae]
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1896463
Title:
evacuation failed: Port update failed : Unable to correlate PCI slot
Status in OpenStack Compute (nova):
New
Bug description:
Description
===========
if the _update_available_resource() of resource_tracker is called between _do_rebuild_instance_with_claim() and instance.save() when evacuating VM instances on destination host,
nova/compute/manager.py
2931 def rebuild_instance(self, context, instance, orig_image_ref, image_ref,
2932 +-- 84 lines: injected_files, new_pass, orig_sys_metadata,-------------------------------------------------------------------
3016 claim_ctxt = rebuild_claim(
3017 context, instance, scheduled_node,
3018 limits=limits, image_meta=image_meta,
3019 migration=migration)
3020 self._do_rebuild_instance_with_claim(
3021 +-- 47 lines: claim_ctxt, context, instance, orig_image_ref,-----------------------------------------------------------------
3068 instance.apply_migration_context()
3069 # NOTE (ndipanov): This save will now update the host and node
3070 # attributes making sure that next RT pass is consistent since
3071 # it will be based on the instance and not the migration DB
3072 # entry.
3073 instance.host = self.host
3074 instance.node = scheduled_node
3075 instance.save()
3076 instance.drop_migration_context()
the instance is not handled as managed instance of the destination
host because it is not updated on DB yet.
2020-09-19 07:27:36.321 8 WARNING nova.compute.resource_tracker [req-
b35d5b9a-0786-4809-bd81-ad306cdda8d5 - - - - -] Instance
22f6ca0e-f964-4467-83a3-f2bf12bb05ae is not being actively managed by
this compute host but has allocations referencing this compute host:
{u'resources': {u'MEMORY_MB': 12288, u'VCPU': 2, u'DISK_GB': 10}}.
Skipping heal of allocation because we do not know what to do.
And so the SRIOV ports (PCI device) was free by clean_usage()
eventhough the VM has the VF port already.
743 def _update_available_resource(self, context, resources):
744 +-- 45 lines: # initialize the compute node object, creating it--------------------------------------------------------------
789 self.pci_tracker.clean_usage(instances, migrations, orphans)
790 dev_pools_obj = self.pci_tracker.stats.to_device_pools_obj()
After that, evacuated this VM to another compute host again, we got
the error like below.
Steps to reproduce
==================
1. create a VM on com1 with SRIOV VF ports.
2. stop and disable nova-compute service on com1
3. wait 60 sec (nova-compute reporting interval)
4. evauate the VM to com2
5. wait the VM is active on com2
6. enable and start nova-compute on com1
7. wait 60 sec (nova-compute reporting interval)
8. stop and disable nova-compute service on com2
9. wait 60 sec (nova-compute reporting interval)
10. evauate the VM to com1
11. wait the VM is active on com1
12. enable and start nova-compute on com2
13. wait 60 sec (nova-compute reporting interval)
14. go to step 2.
Expected result
===============
Evacuation should be done without errors.
Actual result
=============
Evacuation failed with "Port update failed"
Environment
===========
openstack-nova-compute-18.0.1-1 with SRIOV ports are used. libvirt is used.
Logs & Configs
==============
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [req-38dd0be2-7223-4a59-8073-dd1b072125c5 c424fbb3d41f444bb7a025266fda36da 6255a6910b9b4d3ba34a93624fe7fb22 - default default] [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] Setting instance vm_state to ERROR: PortUpdateFailed: Port update failed for port 76dc33dc-5b3b-4c45-b2cb-fd59025a4dbd: Unable to correlate PCI slot 0000:05:12.2
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] Traceback (most recent call last):
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 7993, in _error_out_instance_on_exception
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] yield
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3025, in rebuild_instance
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] migration, request_spec)
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3087, in _do_rebuild_instance_with_claim
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] self._do_rebuild_instance(*args, **kwargs)
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3190, in _do_rebuild_instance
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] context, instance, self.host, migration)
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 2953, in setup_instance_network_on_host
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] migration)
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 3058, in _update_port_binding_for_instance
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] pci_slot)
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae] PortUpdateFailed: Port update failed for port 76dc33dc-5b3b-4c45-b2cb-fd59025a4dbd: Unable to correlate PCI slot 0000:05:12.2
2020-09-19 07:34:22.670 8 ERROR nova.compute.manager [instance: 22f6ca0e-f964-4467-83a3-f2bf12bb05ae]
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1896463/+subscriptions
Follow ups