← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1693899] Re: Cleaning up stale resource allocations can fail with InstanceNotFound

 

Reviewed:  https://review.openstack.org/468517
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=98b8e39ac5f7b3f2bb06ca415bbb806705461d74
Submitter: Jenkins
Branch:    master

commit 98b8e39ac5f7b3f2bb06ca415bbb806705461d74
Author: Chris Dent <cdent@xxxxxxxxxxxxx>
Date:   Fri May 26 19:12:46 2017 +0000

    Catch InstanceNotFound when deleting allocations
    
    If the list of allocations that are going to be deleted includes an
    instance that can no longer be loaded by get_by_uuid (because
    something has already removed it) we need to catch InstanceNotFound
    and remove the allocations as planned. Without this change, the
    exception is not caught, causing an error and leaving behind the
    allocations we don't want to be there.
    
    Change-Id: I204b077c287141a7f5643b2cc0065da2ba395c03
    Closes-Bug: #1693899


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1693899

Title:
  Cleaning up stale resource allocations can fail with InstanceNotFound

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  Noticed what appears to be a race on delete of an instance where we're
  deleting resource provider allocations records for a deleted instance
  and when we go to get the instance from the database it's already
  deleted:

  http://logs.openstack.org/32/468232/1/gate/gate-tempest-dsvm-cells-
  ubuntu-
  xenial/ad969ba/logs/screen-n-cpu.txt.gz?level=TRACE#_May_26_17_29_52_292260

  May 26 17:29:52.292260 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager [req-c0f69b9e-c6eb-43e1-9bc9-372a42901987 None None] Error updating resources for node ubuntu-xenial-ovh-bhs1-9010496.
  May 26 17:29:52.292537 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager Traceback (most recent call last):
  May 26 17:29:52.292730 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager   File "/opt/stack/new/nova/nova/compute/manager.py", line 6588, in update_available_resource_for_node
  May 26 17:29:52.292908 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager     rt.update_available_resource(context, nodename)
  May 26 17:29:52.293082 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager   File "/opt/stack/new/nova/nova/compute/resource_tracker.py", line 626, in update_available_resource
  May 26 17:29:52.293266 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager     self._update_available_resource(context, resources)
  May 26 17:29:52.293430 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 271, in inner
  May 26 17:29:52.293592 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager     return f(*args, **kwargs)
  May 26 17:29:52.293750 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager   File "/opt/stack/new/nova/nova/compute/resource_tracker.py", line 667, in _update_available_resource
  May 26 17:29:52.293909 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager     self._update_usage_from_instances(context, instances, nodename)
  May 26 17:29:52.294070 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager   File "/opt/stack/new/nova/nova/compute/resource_tracker.py", line 1047, in _update_usage_from_instances
  May 26 17:29:52.294235 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager     self._remove_deleted_instances_allocations(context, cn)
  May 26 17:29:52.294453 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager   File "/opt/stack/new/nova/nova/compute/resource_tracker.py", line 1061, in _remove_deleted_instances_allocations
  May 26 17:29:52.294628 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager     expected_attrs=[])
  May 26 17:29:52.294797 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 177, in wrapper
  May 26 17:29:52.294983 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager     args, kwargs)
  May 26 17:29:52.295107 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager   File "/opt/stack/new/nova/nova/conductor/rpcapi.py", line 240, in object_class_action_versions
  May 26 17:29:52.295205 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager     args=args, kwargs=kwargs)
  May 26 17:29:52.295290 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 169, in call
  May 26 17:29:52.295374 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager     retry=self.retry)
  May 26 17:29:52.295457 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 123, in _send
  May 26 17:29:52.295547 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager     timeout=timeout, retry=retry)
  May 26 17:29:52.295630 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 505, in send
  May 26 17:29:52.295712 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager     retry=retry)
  May 26 17:29:52.295795 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 496, in _send
  May 26 17:29:52.295877 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager     raise result
  May 26 17:29:52.295959 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager InstanceNotFound_Remote: Instance 9ecdbc52-8f98-45aa-9814-ade07faa1026 could not be found.

  We should handle the InstanceNotFound since in this path we are
  cleaning up anyway.

  Looks like this is triggered by something semi-recent:

  http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Error%20updating%20resources%20for%20node%5C%22%20AND%20message%3A%5C%22_remove_deleted_instances_allocations%5C%22%20AND%20message%3A%5C%22InstanceNotFound_Remote%5C%22%20AND%20tags%3A%5C%22screen-n-cpu.txt%5C%22&from=7d

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1693899/+subscriptions


References