yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #75684
[Bug 1799892] Re: Placement API crashes with 500s in Rocky upgrade with downed compute nodes
There is an online data migration:
https://review.openstack.org/#/c/377138/62/nova/objects/resource_provider.py@917
But it's only when listing/showing resource providers. The allocation
candidates code must be getting the providers and relying on the
root_provider_id using sqla model objects rather than the versioned
objects that do the online data migration.
This is where something like "placement-manage db
online_data_migrations" would be useful.
** Changed in: nova
Status: New => Triaged
** Changed in: nova
Importance: Undecided => Medium
** Also affects: nova/queens
Importance: Undecided
Status: New
** Also affects: nova/rocky
Importance: Undecided
Status: New
** No longer affects: nova/queens
** Changed in: nova/rocky
Status: New => Triaged
** Changed in: nova/rocky
Importance: Undecided => Medium
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1799892
Title:
Placement API crashes with 500s in Rocky upgrade with downed compute
nodes
Status in OpenStack Compute (nova):
Triaged
Status in OpenStack Compute (nova) rocky series:
Triaged
Bug description:
I ran into this upgrading another environment into Rocky, deleted the
problematic resource provider, but just ran into it again in another
upgrade of another environment so there's something wonky. Here's the
traceback:
=============
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap [req-8ad1c999-7646-4b0a-91c0-cd26a3581766 b61d42657d364008bfdc6fa715e67daf a894e8109af3430aa7ae03e0c49a0aa0 - default default] Placement API unexpected error: 19: KeyError: 19
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap Traceback (most recent call last):
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/fault_wrap.py", line 40, in __call__
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap return self.application(environ, start_response)
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/webob/dec.py", line 129, in __call__
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap resp = self.call_func(req, *args, **kw)
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/webob/dec.py", line 193, in call_func
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap return self.func(req, *args, **kwargs)
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/microversion_parse/middleware.py", line 80, in __call__
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap response = req.get_response(self.application)
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/webob/request.py", line 1313, in send
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap application, catch_exc_info=False)
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/webob/request.py", line 1277, in call_application
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap app_iter = application(self.environ, start_response)
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/handler.py", line 209, in __call__
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap return dispatch(environ, start_response, self._map)
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/handler.py", line 146, in dispatch
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap return handler(environ, start_response)
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/webob/dec.py", line 129, in __call__
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap resp = self.call_func(req, *args, **kw)
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/wsgi_wrapper.py", line 29, in call_func
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap super(PlacementWsgify, self).call_func(req, *args, **kwargs)
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/webob/dec.py", line 193, in call_func
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap return self.func(req, *args, **kwargs)
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/microversion.py", line 164, in decorated_func
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap return _find_method(f, version, status_code)(req, *args, **kwargs)
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/util.py", line 81, in decorated_function
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap return f(req)
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/handlers/allocation_candidate.py", line 316, in list_allocation_candidates
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap context, requests, limit=limit, group_policy=group_policy)
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/objects/resource_provider.py", line 3965, in get_by_requests
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap context, requests, limit=limit, group_policy=group_policy)
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 993, in wrapper
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap return fn(*args, **kwargs)
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/objects/resource_provider.py", line 4071, in _get_by_requests
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap context, request, sharing, has_trees)
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/objects/resource_provider.py", line 4045, in _get_by_one_request
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap return _alloc_candidates_single_provider(context, resources, rp_ids)
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/objects/resource_provider.py", line 3490, in _alloc_candidates_single_provider
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap rp_summary = summaries[rp_id]
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap KeyError: 19
2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap
=============
The resource provider (nova-compute) with ID 19 was down during the
upgrade (it was put down for a long time ago). The only oddities I
found was in the database, `root_provider_id` was set to NULL for that
record too. Upon deleting the resource provider, the placement API
stopped giving 500s when it tried to schedule new VMs.
In the other environment that had a problem too, it actually was the
downed instance as well.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1799892/+subscriptions
References