← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1799892] Re: Placement API crashes with 500s in Rocky upgrade with downed compute nodes

 

There is an online data migration:

https://review.openstack.org/#/c/377138/62/nova/objects/resource_provider.py@917

But it's only when listing/showing resource providers. The allocation
candidates code must be getting the providers and relying on the
root_provider_id using sqla model objects rather than the versioned
objects that do the online data migration.

This is where something like "placement-manage db
online_data_migrations" would be useful.

** Changed in: nova
       Status: New => Triaged

** Changed in: nova
   Importance: Undecided => Medium

** Also affects: nova/queens
   Importance: Undecided
       Status: New

** Also affects: nova/rocky
   Importance: Undecided
       Status: New

** No longer affects: nova/queens

** Changed in: nova/rocky
       Status: New => Triaged

** Changed in: nova/rocky
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1799892

Title:
  Placement API crashes with 500s in Rocky upgrade with downed compute
  nodes

Status in OpenStack Compute (nova):
  Triaged
Status in OpenStack Compute (nova) rocky series:
  Triaged

Bug description:
  I ran into this upgrading another environment into Rocky, deleted the
  problematic resource provider, but just ran into it again in another
  upgrade of another environment so there's something wonky.  Here's the
  traceback:

  =============
  2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap [req-8ad1c999-7646-4b0a-91c0-cd26a3581766 b61d42657d364008bfdc6fa715e67daf a894e8109af3430aa7ae03e0c49a0aa0 - default default] Placement API unexpected error: 19: KeyError: 19
  2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap Traceback (most recent call last):
  2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap   File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/fault_wrap.py", line 40, in __call__
  2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap     return self.application(environ, start_response)
  2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap   File "/usr/lib/python2.7/site-packages/webob/dec.py", line 129, in __call__
  2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap     resp = self.call_func(req, *args, **kw)
  2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap   File "/usr/lib/python2.7/site-packages/webob/dec.py", line 193, in call_func
  2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap     return self.func(req, *args, **kwargs)
  2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap   File "/usr/lib/python2.7/site-packages/microversion_parse/middleware.py", line 80, in __call__
  2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap     response = req.get_response(self.application)
  2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap   File "/usr/lib/python2.7/site-packages/webob/request.py", line 1313, in send
  2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap     application, catch_exc_info=False)
  2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap   File "/usr/lib/python2.7/site-packages/webob/request.py", line 1277, in call_application
  2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap     app_iter = application(self.environ, start_response)
  2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap   File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/handler.py", line 209, in __call__
  2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap     return dispatch(environ, start_response, self._map)
  2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap   File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/handler.py", line 146, in dispatch
  2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap     return handler(environ, start_response)
  2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap   File "/usr/lib/python2.7/site-packages/webob/dec.py", line 129, in __call__
  2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap     resp = self.call_func(req, *args, **kw)
  2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap   File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/wsgi_wrapper.py", line 29, in call_func
  2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap     super(PlacementWsgify, self).call_func(req, *args, **kwargs)
  2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap   File "/usr/lib/python2.7/site-packages/webob/dec.py", line 193, in call_func
  2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap     return self.func(req, *args, **kwargs)
  2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap   File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/microversion.py", line 164, in decorated_func
  2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap     return _find_method(f, version, status_code)(req, *args, **kwargs)
  2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap   File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/util.py", line 81, in decorated_function
  2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap     return f(req)
  2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap   File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/handlers/allocation_candidate.py", line 316, in list_allocation_candidates
  2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap     context, requests, limit=limit, group_policy=group_policy)
  2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap   File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/objects/resource_provider.py", line 3965, in get_by_requests
  2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap     context, requests, limit=limit, group_policy=group_policy)
  2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap   File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 993, in wrapper
  2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap     return fn(*args, **kwargs)
  2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap   File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/objects/resource_provider.py", line 4071, in _get_by_requests
  2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap     context, request, sharing, has_trees)
  2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap   File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/objects/resource_provider.py", line 4045, in _get_by_one_request
  2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap     return _alloc_candidates_single_provider(context, resources, rp_ids)
  2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap   File "/usr/lib/python2.7/site-packages/nova/api/openstack/placement/objects/resource_provider.py", line 3490, in _alloc_candidates_single_provider
  2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap     rp_summary = summaries[rp_id]
  2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap KeyError: 19
  2018-10-25 09:18:29.853 7431 ERROR nova.api.openstack.placement.fault_wrap 
  =============

  The resource provider (nova-compute) with ID 19 was down during the
  upgrade (it was put down for a long time ago).  The only oddities I
  found was in the database, `root_provider_id` was set to NULL for that
  record too.  Upon deleting the resource provider, the placement API
  stopped giving 500s when it tried to schedule new VMs.

  In the other environment that had a problem too, it actually was the
  downed instance as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1799892/+subscriptions


References