← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1693899] [NEW] Cleaning up stale resource allocations can fail with InstanceNotFound

 

Public bug reported:

Noticed what appears to be a race on delete of an instance where we're
deleting resource provider allocations records for a deleted instance
and when we go to get the instance from the database it's already
deleted:

http://logs.openstack.org/32/468232/1/gate/gate-tempest-dsvm-cells-
ubuntu-
xenial/ad969ba/logs/screen-n-cpu.txt.gz?level=TRACE#_May_26_17_29_52_292260

May 26 17:29:52.292260 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager [req-c0f69b9e-c6eb-43e1-9bc9-372a42901987 None None] Error updating resources for node ubuntu-xenial-ovh-bhs1-9010496.
May 26 17:29:52.292537 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager Traceback (most recent call last):
May 26 17:29:52.292730 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager   File "/opt/stack/new/nova/nova/compute/manager.py", line 6588, in update_available_resource_for_node
May 26 17:29:52.292908 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager     rt.update_available_resource(context, nodename)
May 26 17:29:52.293082 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager   File "/opt/stack/new/nova/nova/compute/resource_tracker.py", line 626, in update_available_resource
May 26 17:29:52.293266 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager     self._update_available_resource(context, resources)
May 26 17:29:52.293430 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 271, in inner
May 26 17:29:52.293592 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager     return f(*args, **kwargs)
May 26 17:29:52.293750 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager   File "/opt/stack/new/nova/nova/compute/resource_tracker.py", line 667, in _update_available_resource
May 26 17:29:52.293909 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager     self._update_usage_from_instances(context, instances, nodename)
May 26 17:29:52.294070 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager   File "/opt/stack/new/nova/nova/compute/resource_tracker.py", line 1047, in _update_usage_from_instances
May 26 17:29:52.294235 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager     self._remove_deleted_instances_allocations(context, cn)
May 26 17:29:52.294453 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager   File "/opt/stack/new/nova/nova/compute/resource_tracker.py", line 1061, in _remove_deleted_instances_allocations
May 26 17:29:52.294628 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager     expected_attrs=[])
May 26 17:29:52.294797 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 177, in wrapper
May 26 17:29:52.294983 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager     args, kwargs)
May 26 17:29:52.295107 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager   File "/opt/stack/new/nova/nova/conductor/rpcapi.py", line 240, in object_class_action_versions
May 26 17:29:52.295205 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager     args=args, kwargs=kwargs)
May 26 17:29:52.295290 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 169, in call
May 26 17:29:52.295374 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager     retry=self.retry)
May 26 17:29:52.295457 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 123, in _send
May 26 17:29:52.295547 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager     timeout=timeout, retry=retry)
May 26 17:29:52.295630 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 505, in send
May 26 17:29:52.295712 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager     retry=retry)
May 26 17:29:52.295795 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 496, in _send
May 26 17:29:52.295877 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager     raise result
May 26 17:29:52.295959 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager InstanceNotFound_Remote: Instance 9ecdbc52-8f98-45aa-9814-ade07faa1026 could not be found.

We should handle the InstanceNotFound since in this path we are cleaning
up anyway.

Looks like this is triggered by something semi-recent:

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Error%20updating%20resources%20for%20node%5C%22%20AND%20message%3A%5C%22_remove_deleted_instances_allocations%5C%22%20AND%20message%3A%5C%22InstanceNotFound_Remote%5C%22%20AND%20tags%3A%5C%22screen-n-cpu.txt%5C%22&from=7d

** Affects: nova
     Importance: Medium
     Assignee: Chris Dent (cdent)
         Status: Confirmed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1693899

Title:
  Cleaning up stale resource allocations can fail with InstanceNotFound

Status in OpenStack Compute (nova):
  Confirmed

Bug description:
  Noticed what appears to be a race on delete of an instance where we're
  deleting resource provider allocations records for a deleted instance
  and when we go to get the instance from the database it's already
  deleted:

  http://logs.openstack.org/32/468232/1/gate/gate-tempest-dsvm-cells-
  ubuntu-
  xenial/ad969ba/logs/screen-n-cpu.txt.gz?level=TRACE#_May_26_17_29_52_292260

  May 26 17:29:52.292260 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager [req-c0f69b9e-c6eb-43e1-9bc9-372a42901987 None None] Error updating resources for node ubuntu-xenial-ovh-bhs1-9010496.
  May 26 17:29:52.292537 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager Traceback (most recent call last):
  May 26 17:29:52.292730 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager   File "/opt/stack/new/nova/nova/compute/manager.py", line 6588, in update_available_resource_for_node
  May 26 17:29:52.292908 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager     rt.update_available_resource(context, nodename)
  May 26 17:29:52.293082 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager   File "/opt/stack/new/nova/nova/compute/resource_tracker.py", line 626, in update_available_resource
  May 26 17:29:52.293266 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager     self._update_available_resource(context, resources)
  May 26 17:29:52.293430 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 271, in inner
  May 26 17:29:52.293592 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager     return f(*args, **kwargs)
  May 26 17:29:52.293750 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager   File "/opt/stack/new/nova/nova/compute/resource_tracker.py", line 667, in _update_available_resource
  May 26 17:29:52.293909 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager     self._update_usage_from_instances(context, instances, nodename)
  May 26 17:29:52.294070 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager   File "/opt/stack/new/nova/nova/compute/resource_tracker.py", line 1047, in _update_usage_from_instances
  May 26 17:29:52.294235 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager     self._remove_deleted_instances_allocations(context, cn)
  May 26 17:29:52.294453 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager   File "/opt/stack/new/nova/nova/compute/resource_tracker.py", line 1061, in _remove_deleted_instances_allocations
  May 26 17:29:52.294628 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager     expected_attrs=[])
  May 26 17:29:52.294797 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 177, in wrapper
  May 26 17:29:52.294983 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager     args, kwargs)
  May 26 17:29:52.295107 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager   File "/opt/stack/new/nova/nova/conductor/rpcapi.py", line 240, in object_class_action_versions
  May 26 17:29:52.295205 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager     args=args, kwargs=kwargs)
  May 26 17:29:52.295290 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 169, in call
  May 26 17:29:52.295374 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager     retry=self.retry)
  May 26 17:29:52.295457 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 123, in _send
  May 26 17:29:52.295547 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager     timeout=timeout, retry=retry)
  May 26 17:29:52.295630 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 505, in send
  May 26 17:29:52.295712 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager     retry=retry)
  May 26 17:29:52.295795 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 496, in _send
  May 26 17:29:52.295877 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager     raise result
  May 26 17:29:52.295959 ubuntu-xenial-ovh-bhs1-9010496 nova-compute[11974]: ERROR nova.compute.manager InstanceNotFound_Remote: Instance 9ecdbc52-8f98-45aa-9814-ade07faa1026 could not be found.

  We should handle the InstanceNotFound since in this path we are
  cleaning up anyway.

  Looks like this is triggered by something semi-recent:

  http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Error%20updating%20resources%20for%20node%5C%22%20AND%20message%3A%5C%22_remove_deleted_instances_allocations%5C%22%20AND%20message%3A%5C%22InstanceNotFound_Remote%5C%22%20AND%20tags%3A%5C%22screen-n-cpu.txt%5C%22&from=7d

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1693899/+subscriptions


Follow ups