← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1953718] [NEW] nova compute failed to update placement if mdev max available is 0

 

Public bug reported:

Description
===========
nova compute will failed to update vgpu mdev placement data if mdev type changed while
there are some previously created mdev devices with different types. For nvidia, under such
circumstances max available instances will be 0.

Steps to reproduce
==================

configure vgpu type to nvida-231 at first,
boot one instance
then change vgpu type to nvida-233 and reboot nova-compute service
then it will failed to update placement

Expected result
===============
better observability, for example refuse to start nova-compute service or better logging to help
operator understand the possible cause.

Actual result
=============

2021-12-09 07:18:13.774 632001 ERROR nova.scheduler.client.report [None req-d717a248-4d90-4262-bf8b-11875c60aea6 - - - - -] [req-03944f1d-79bb-4d2f-b37a-99db24d78653] Failed to update inventory to [{'VGPU': {'total': 0, 'min_unit': 1, 'step_size': 1, 'reserved': 0, 'allocation_ratio': 1.0, 'max_unit': 0}}] for resource provider with UUID 9b6dd7c7-50c8-4780-b343-4c2e65dd0c67.  Got 400: {"errors": [{"status": 400, "title": "Bad Request", "detail": "The server could not comply with the request since it is either malformed or otherwise incorrect.\n\n JSON does not validate: 0 is less than the minimum of 1  Failed validating 'minimum' in schema['properties']['inventories']['patternProperties']['^[A-Z0-9_]+$']['properties']['total']:     {'maximum': 2147483647, 'minimum': 1, 'type': 'integer'}  On instance['inventories']['VGPU']['total']:     0  ", "code": "placement.undefined_code", "request_id": "req-03944f1d-79bb-4d2f-b37a-99db24d78653"}]}
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager [None req-d717a248-4d90-4262-bf8b-11875c60aea6 - - - - -] Error updating resources for node compute-009.: nova.exception.ResourceProviderSyncFailed: Failed to synchronize the placement service with resource provider information supplied by the compute host.
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager Traceback (most recent call last):
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager   File "/var/lib/openstack/lib/python3.8/site-packages/nova/scheduler/client/report.py", line 1342, in catch_all
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager     yield
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager   File "/var/lib/openstack/lib/python3.8/site-packages/nova/scheduler/client/report.py", line 1430, in update_from_provider_tree
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager     self.set_inventory_for_provider(
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager   File "/var/lib/openstack/lib/python3.8/site-packages/nova/scheduler/client/report.py", line 951, in set_inventory_for_provider
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager     raise exception.ResourceProviderUpdateFailed(url=url, error=resp.text)
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager nova.exception.ResourceProviderUpdateFailed: Failed to update resource provider via URL /resource_providers/9b6dd7c7-50c8-4780-b343-4c2e65dd0c67/inventories: {"errors": [{"status": 400, "title": "Bad Request", "detail": "The server could not comply with the request since it is either malformed or otherwise incorrect.\n\n JSON does not validate: 0 is less than the minimum of 1  Failed validating 'minimum' in schema['properties']['inventories']['patternProperties']['^[A-Z0-9_]+$']['properties']['total']:     {'maximum': 2147483647, 'minimum': 1, 'type': 'integer'}  On instance['inventories']['VGPU']['total']:     0  ", "code": "placement.undefined_code", "request_id": "req-03944f1d-79bb-4d2f-b37a-99db24d78653"}]}
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager 
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager During handling of the above exception, another exception occurred:
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager 
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager Traceback (most recent call last):
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager   File "/var/lib/openstack/lib/python3.8/site-packages/nova/compute/manager.py", line 10293, in _update_available_resource_for_node
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager     self.rt.update_available_resource(context, nodename,
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager   File "/var/lib/openstack/lib/python3.8/site-packages/nova/compute/resource_tracker.py", line 910, in update_available_resource
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager     self._update_available_resource(context, resources, startup=startup)
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager   File "/var/lib/openstack/lib/python3.8/site-packages/oslo_concurrency/lockutils.py", line 360, in inner
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager     return f(*args, **kwargs)
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager   File "/var/lib/openstack/lib/python3.8/site-packages/nova/compute/resource_tracker.py", line 995, in _update_available_resource
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager     self._update(context, cn, startup=startup)
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager   File "/var/lib/openstack/lib/python3.8/site-packages/nova/compute/resource_tracker.py", line 1251, in _update
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager     self._update_to_placement(context, compute_node, startup)
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager   File "/var/lib/openstack/lib/python3.8/site-packages/retrying.py", line 49, in wrapped_f
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager     return Retrying(*dargs, **dkw).call(f, *args, **kw)
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager   File "/var/lib/openstack/lib/python3.8/site-packages/retrying.py", line 206, in call
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager     return attempt.get(self._wrap_exception)
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager   File "/var/lib/openstack/lib/python3.8/site-packages/retrying.py", line 247, in get
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager     six.reraise(self.value[0], self.value[1], self.value[2])
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager   File "/var/lib/openstack/lib/python3.8/site-packages/six.py", line 703, in reraise
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager     raise value
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager   File "/var/lib/openstack/lib/python3.8/site-packages/retrying.py", line 200, in call
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager     attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager   File "/var/lib/openstack/lib/python3.8/site-packages/nova/compute/resource_tracker.py", line 1227, in _update_to_placement
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager     self.reportclient.update_from_provider_tree(context, prov_tree,
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager   File "/var/lib/openstack/lib/python3.8/site-packages/nova/scheduler/client/report.py", line 1434, in update_from_provider_tree
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager     self.set_traits_for_provider(context, pd.uuid, pd.traits)
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager   File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager     self.gen.throw(type, value, traceback)
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager   File "/var/lib/openstack/lib/python3.8/site-packages/nova/scheduler/client/report.py", line 1354, in catch_all
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager     raise exception.ResourceProviderSyncFailed()
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager nova.exception.ResourceProviderSyncFailed: Failed to synchronize the placement service with resource provider information supplied by the compute host.
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1953718

Title:
  nova compute failed to update placement if mdev max available is 0

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===========
  nova compute will failed to update vgpu mdev placement data if mdev type changed while
  there are some previously created mdev devices with different types. For nvidia, under such
  circumstances max available instances will be 0.

  Steps to reproduce
  ==================

  configure vgpu type to nvida-231 at first,
  boot one instance
  then change vgpu type to nvida-233 and reboot nova-compute service
  then it will failed to update placement

  Expected result
  ===============
  better observability, for example refuse to start nova-compute service or better logging to help
  operator understand the possible cause.

  Actual result
  =============

  2021-12-09 07:18:13.774 632001 ERROR nova.scheduler.client.report [None req-d717a248-4d90-4262-bf8b-11875c60aea6 - - - - -] [req-03944f1d-79bb-4d2f-b37a-99db24d78653] Failed to update inventory to [{'VGPU': {'total': 0, 'min_unit': 1, 'step_size': 1, 'reserved': 0, 'allocation_ratio': 1.0, 'max_unit': 0}}] for resource provider with UUID 9b6dd7c7-50c8-4780-b343-4c2e65dd0c67.  Got 400: {"errors": [{"status": 400, "title": "Bad Request", "detail": "The server could not comply with the request since it is either malformed or otherwise incorrect.\n\n JSON does not validate: 0 is less than the minimum of 1  Failed validating 'minimum' in schema['properties']['inventories']['patternProperties']['^[A-Z0-9_]+$']['properties']['total']:     {'maximum': 2147483647, 'minimum': 1, 'type': 'integer'}  On instance['inventories']['VGPU']['total']:     0  ", "code": "placement.undefined_code", "request_id": "req-03944f1d-79bb-4d2f-b37a-99db24d78653"}]}
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager [None req-d717a248-4d90-4262-bf8b-11875c60aea6 - - - - -] Error updating resources for node compute-009.: nova.exception.ResourceProviderSyncFailed: Failed to synchronize the placement service with resource provider information supplied by the compute host.
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager Traceback (most recent call last):
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager   File "/var/lib/openstack/lib/python3.8/site-packages/nova/scheduler/client/report.py", line 1342, in catch_all
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager     yield
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager   File "/var/lib/openstack/lib/python3.8/site-packages/nova/scheduler/client/report.py", line 1430, in update_from_provider_tree
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager     self.set_inventory_for_provider(
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager   File "/var/lib/openstack/lib/python3.8/site-packages/nova/scheduler/client/report.py", line 951, in set_inventory_for_provider
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager     raise exception.ResourceProviderUpdateFailed(url=url, error=resp.text)
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager nova.exception.ResourceProviderUpdateFailed: Failed to update resource provider via URL /resource_providers/9b6dd7c7-50c8-4780-b343-4c2e65dd0c67/inventories: {"errors": [{"status": 400, "title": "Bad Request", "detail": "The server could not comply with the request since it is either malformed or otherwise incorrect.\n\n JSON does not validate: 0 is less than the minimum of 1  Failed validating 'minimum' in schema['properties']['inventories']['patternProperties']['^[A-Z0-9_]+$']['properties']['total']:     {'maximum': 2147483647, 'minimum': 1, 'type': 'integer'}  On instance['inventories']['VGPU']['total']:     0  ", "code": "placement.undefined_code", "request_id": "req-03944f1d-79bb-4d2f-b37a-99db24d78653"}]}
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager 
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager During handling of the above exception, another exception occurred:
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager 
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager Traceback (most recent call last):
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager   File "/var/lib/openstack/lib/python3.8/site-packages/nova/compute/manager.py", line 10293, in _update_available_resource_for_node
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager     self.rt.update_available_resource(context, nodename,
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager   File "/var/lib/openstack/lib/python3.8/site-packages/nova/compute/resource_tracker.py", line 910, in update_available_resource
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager     self._update_available_resource(context, resources, startup=startup)
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager   File "/var/lib/openstack/lib/python3.8/site-packages/oslo_concurrency/lockutils.py", line 360, in inner
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager     return f(*args, **kwargs)
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager   File "/var/lib/openstack/lib/python3.8/site-packages/nova/compute/resource_tracker.py", line 995, in _update_available_resource
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager     self._update(context, cn, startup=startup)
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager   File "/var/lib/openstack/lib/python3.8/site-packages/nova/compute/resource_tracker.py", line 1251, in _update
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager     self._update_to_placement(context, compute_node, startup)
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager   File "/var/lib/openstack/lib/python3.8/site-packages/retrying.py", line 49, in wrapped_f
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager     return Retrying(*dargs, **dkw).call(f, *args, **kw)
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager   File "/var/lib/openstack/lib/python3.8/site-packages/retrying.py", line 206, in call
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager     return attempt.get(self._wrap_exception)
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager   File "/var/lib/openstack/lib/python3.8/site-packages/retrying.py", line 247, in get
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager     six.reraise(self.value[0], self.value[1], self.value[2])
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager   File "/var/lib/openstack/lib/python3.8/site-packages/six.py", line 703, in reraise
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager     raise value
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager   File "/var/lib/openstack/lib/python3.8/site-packages/retrying.py", line 200, in call
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager     attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager   File "/var/lib/openstack/lib/python3.8/site-packages/nova/compute/resource_tracker.py", line 1227, in _update_to_placement
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager     self.reportclient.update_from_provider_tree(context, prov_tree,
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager   File "/var/lib/openstack/lib/python3.8/site-packages/nova/scheduler/client/report.py", line 1434, in update_from_provider_tree
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager     self.set_traits_for_provider(context, pd.uuid, pd.traits)
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager   File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager     self.gen.throw(type, value, traceback)
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager   File "/var/lib/openstack/lib/python3.8/site-packages/nova/scheduler/client/report.py", line 1354, in catch_all
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager     raise exception.ResourceProviderSyncFailed()
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager nova.exception.ResourceProviderSyncFailed: Failed to synchronize the placement service with resource provider information supplied by the compute host.
  2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1953718/+subscriptions



Follow ups