yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #87834
[Bug 1953718] [NEW] nova compute failed to update placement if mdev max available is 0
Public bug reported:
Description
===========
nova compute will failed to update vgpu mdev placement data if mdev type changed while
there are some previously created mdev devices with different types. For nvidia, under such
circumstances max available instances will be 0.
Steps to reproduce
==================
configure vgpu type to nvida-231 at first,
boot one instance
then change vgpu type to nvida-233 and reboot nova-compute service
then it will failed to update placement
Expected result
===============
better observability, for example refuse to start nova-compute service or better logging to help
operator understand the possible cause.
Actual result
=============
2021-12-09 07:18:13.774 632001 ERROR nova.scheduler.client.report [None req-d717a248-4d90-4262-bf8b-11875c60aea6 - - - - -] [req-03944f1d-79bb-4d2f-b37a-99db24d78653] Failed to update inventory to [{'VGPU': {'total': 0, 'min_unit': 1, 'step_size': 1, 'reserved': 0, 'allocation_ratio': 1.0, 'max_unit': 0}}] for resource provider with UUID 9b6dd7c7-50c8-4780-b343-4c2e65dd0c67. Got 400: {"errors": [{"status": 400, "title": "Bad Request", "detail": "The server could not comply with the request since it is either malformed or otherwise incorrect.\n\n JSON does not validate: 0 is less than the minimum of 1 Failed validating 'minimum' in schema['properties']['inventories']['patternProperties']['^[A-Z0-9_]+$']['properties']['total']: {'maximum': 2147483647, 'minimum': 1, 'type': 'integer'} On instance['inventories']['VGPU']['total']: 0 ", "code": "placement.undefined_code", "request_id": "req-03944f1d-79bb-4d2f-b37a-99db24d78653"}]}
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager [None req-d717a248-4d90-4262-bf8b-11875c60aea6 - - - - -] Error updating resources for node compute-009.: nova.exception.ResourceProviderSyncFailed: Failed to synchronize the placement service with resource provider information supplied by the compute host.
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager Traceback (most recent call last):
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/nova/scheduler/client/report.py", line 1342, in catch_all
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager yield
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/nova/scheduler/client/report.py", line 1430, in update_from_provider_tree
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager self.set_inventory_for_provider(
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/nova/scheduler/client/report.py", line 951, in set_inventory_for_provider
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager raise exception.ResourceProviderUpdateFailed(url=url, error=resp.text)
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager nova.exception.ResourceProviderUpdateFailed: Failed to update resource provider via URL /resource_providers/9b6dd7c7-50c8-4780-b343-4c2e65dd0c67/inventories: {"errors": [{"status": 400, "title": "Bad Request", "detail": "The server could not comply with the request since it is either malformed or otherwise incorrect.\n\n JSON does not validate: 0 is less than the minimum of 1 Failed validating 'minimum' in schema['properties']['inventories']['patternProperties']['^[A-Z0-9_]+$']['properties']['total']: {'maximum': 2147483647, 'minimum': 1, 'type': 'integer'} On instance['inventories']['VGPU']['total']: 0 ", "code": "placement.undefined_code", "request_id": "req-03944f1d-79bb-4d2f-b37a-99db24d78653"}]}
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager During handling of the above exception, another exception occurred:
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager Traceback (most recent call last):
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/nova/compute/manager.py", line 10293, in _update_available_resource_for_node
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager self.rt.update_available_resource(context, nodename,
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/nova/compute/resource_tracker.py", line 910, in update_available_resource
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager self._update_available_resource(context, resources, startup=startup)
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/oslo_concurrency/lockutils.py", line 360, in inner
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager return f(*args, **kwargs)
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/nova/compute/resource_tracker.py", line 995, in _update_available_resource
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager self._update(context, cn, startup=startup)
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/nova/compute/resource_tracker.py", line 1251, in _update
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager self._update_to_placement(context, compute_node, startup)
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/retrying.py", line 49, in wrapped_f
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager return Retrying(*dargs, **dkw).call(f, *args, **kw)
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/retrying.py", line 206, in call
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager return attempt.get(self._wrap_exception)
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/retrying.py", line 247, in get
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager six.reraise(self.value[0], self.value[1], self.value[2])
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/six.py", line 703, in reraise
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager raise value
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/retrying.py", line 200, in call
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/nova/compute/resource_tracker.py", line 1227, in _update_to_placement
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager self.reportclient.update_from_provider_tree(context, prov_tree,
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/nova/scheduler/client/report.py", line 1434, in update_from_provider_tree
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager self.set_traits_for_provider(context, pd.uuid, pd.traits)
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager self.gen.throw(type, value, traceback)
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/nova/scheduler/client/report.py", line 1354, in catch_all
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager raise exception.ResourceProviderSyncFailed()
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager nova.exception.ResourceProviderSyncFailed: Failed to synchronize the placement service with resource provider information supplied by the compute host.
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1953718
Title:
nova compute failed to update placement if mdev max available is 0
Status in OpenStack Compute (nova):
New
Bug description:
Description
===========
nova compute will failed to update vgpu mdev placement data if mdev type changed while
there are some previously created mdev devices with different types. For nvidia, under such
circumstances max available instances will be 0.
Steps to reproduce
==================
configure vgpu type to nvida-231 at first,
boot one instance
then change vgpu type to nvida-233 and reboot nova-compute service
then it will failed to update placement
Expected result
===============
better observability, for example refuse to start nova-compute service or better logging to help
operator understand the possible cause.
Actual result
=============
2021-12-09 07:18:13.774 632001 ERROR nova.scheduler.client.report [None req-d717a248-4d90-4262-bf8b-11875c60aea6 - - - - -] [req-03944f1d-79bb-4d2f-b37a-99db24d78653] Failed to update inventory to [{'VGPU': {'total': 0, 'min_unit': 1, 'step_size': 1, 'reserved': 0, 'allocation_ratio': 1.0, 'max_unit': 0}}] for resource provider with UUID 9b6dd7c7-50c8-4780-b343-4c2e65dd0c67. Got 400: {"errors": [{"status": 400, "title": "Bad Request", "detail": "The server could not comply with the request since it is either malformed or otherwise incorrect.\n\n JSON does not validate: 0 is less than the minimum of 1 Failed validating 'minimum' in schema['properties']['inventories']['patternProperties']['^[A-Z0-9_]+$']['properties']['total']: {'maximum': 2147483647, 'minimum': 1, 'type': 'integer'} On instance['inventories']['VGPU']['total']: 0 ", "code": "placement.undefined_code", "request_id": "req-03944f1d-79bb-4d2f-b37a-99db24d78653"}]}
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager [None req-d717a248-4d90-4262-bf8b-11875c60aea6 - - - - -] Error updating resources for node compute-009.: nova.exception.ResourceProviderSyncFailed: Failed to synchronize the placement service with resource provider information supplied by the compute host.
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager Traceback (most recent call last):
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/nova/scheduler/client/report.py", line 1342, in catch_all
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager yield
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/nova/scheduler/client/report.py", line 1430, in update_from_provider_tree
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager self.set_inventory_for_provider(
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/nova/scheduler/client/report.py", line 951, in set_inventory_for_provider
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager raise exception.ResourceProviderUpdateFailed(url=url, error=resp.text)
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager nova.exception.ResourceProviderUpdateFailed: Failed to update resource provider via URL /resource_providers/9b6dd7c7-50c8-4780-b343-4c2e65dd0c67/inventories: {"errors": [{"status": 400, "title": "Bad Request", "detail": "The server could not comply with the request since it is either malformed or otherwise incorrect.\n\n JSON does not validate: 0 is less than the minimum of 1 Failed validating 'minimum' in schema['properties']['inventories']['patternProperties']['^[A-Z0-9_]+$']['properties']['total']: {'maximum': 2147483647, 'minimum': 1, 'type': 'integer'} On instance['inventories']['VGPU']['total']: 0 ", "code": "placement.undefined_code", "request_id": "req-03944f1d-79bb-4d2f-b37a-99db24d78653"}]}
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager During handling of the above exception, another exception occurred:
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager Traceback (most recent call last):
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/nova/compute/manager.py", line 10293, in _update_available_resource_for_node
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager self.rt.update_available_resource(context, nodename,
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/nova/compute/resource_tracker.py", line 910, in update_available_resource
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager self._update_available_resource(context, resources, startup=startup)
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/oslo_concurrency/lockutils.py", line 360, in inner
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager return f(*args, **kwargs)
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/nova/compute/resource_tracker.py", line 995, in _update_available_resource
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager self._update(context, cn, startup=startup)
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/nova/compute/resource_tracker.py", line 1251, in _update
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager self._update_to_placement(context, compute_node, startup)
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/retrying.py", line 49, in wrapped_f
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager return Retrying(*dargs, **dkw).call(f, *args, **kw)
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/retrying.py", line 206, in call
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager return attempt.get(self._wrap_exception)
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/retrying.py", line 247, in get
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager six.reraise(self.value[0], self.value[1], self.value[2])
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/six.py", line 703, in reraise
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager raise value
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/retrying.py", line 200, in call
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/nova/compute/resource_tracker.py", line 1227, in _update_to_placement
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager self.reportclient.update_from_provider_tree(context, prov_tree,
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/nova/scheduler/client/report.py", line 1434, in update_from_provider_tree
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager self.set_traits_for_provider(context, pd.uuid, pd.traits)
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager self.gen.throw(type, value, traceback)
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/nova/scheduler/client/report.py", line 1354, in catch_all
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager raise exception.ResourceProviderSyncFailed()
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager nova.exception.ResourceProviderSyncFailed: Failed to synchronize the placement service with resource provider information supplied by the compute host.
2021-12-09 07:18:13.775 632001 ERROR nova.compute.manager
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1953718/+subscriptions
Follow ups