← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1737024] Re: InstanceNotFound stack trace in nova-compute logs if instance is deleted during snapshot

 

Reviewed:  https://review.openstack.org/557152
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=825529a3de45606839f6d44a924ca6263c970dfe
Submitter: Zuul
Branch:    master

commit 825529a3de45606839f6d44a924ca6263c970dfe
Author: jichen <jichenjc@xxxxxxxxxx>
Date:   Wed Mar 28 13:47:20 2018 +0800

    Move update_task_state out of try/except
    
    During snapshot there might be some race condition lead
    to instance be deleted, compute layer has handle on the
    InstanceNotFound exception so move update_task_state
    out of try/except in snapshot function to avoid callback
    trace.
    
    Also, note follow up patch will handle additional race
    condition on instance disappear during other exception
    such as NotImplemented occurs scenario.
    
    Change-Id: Id9a009c8ab2a02f6f860399eeb145337de3c92e9
    Closes-Bug: 1737024


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1737024

Title:
  InstanceNotFound stack trace in nova-compute logs if instance is
  deleted during snapshot

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  I saw this in some zKVM CI logs, the instance is deleted while it's
  being snapshot:

  http://sng01.objectstorage.softlayer.net/v1/AUTH_1940ea10-6e82-4501-b2f9-eb236510e575/ibmzkvmci/production/525787/4
  /check-tempest-dsvm-neutron-full-ubuntu-xenial-
  s390x/f737a5b/logs/screen-n-cpu.txt.gz

  Dec 07 17:45:42.360741 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]: ERROR nova.virt.libvirt.driver [None req-76cb1ae9-b516-49d4-bb52-625ad96cb932 tempest-ImagesTestJSON-1741486695 tempest-ImagesTestJSON-1741486695] Failed to snapshot image: InstanceNotFound_Remote: Instance 89568634-e208-4973-b5e6-71fb262c49b5 could not be found.
  Dec 07 17:45:42.360885 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]: Traceback (most recent call last):
  Dec 07 17:45:42.361123 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]:   File "/opt/stack/new/nova/nova/conductor/manager.py", line 123, in _object_dispatch
  Dec 07 17:45:42.361219 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]:     return getattr(target, method)(*args, **kwargs)
  Dec 07 17:45:42.361314 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]:   File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 226, in wrapper
  Dec 07 17:45:42.361409 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]:     return fn(self, *args, **kwargs)
  Dec 07 17:45:42.361513 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]:   File "/opt/stack/new/nova/nova/objects/instance.py", line 799, in save
  Dec 07 17:45:42.361795 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]:     columns_to_join=_expected_cols(expected_attrs))
  Dec 07 17:45:42.361892 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]:   File "/opt/stack/new/nova/nova/db/api.py", line 875, in instance_update_and_get_original
  Dec 07 17:45:42.361988 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]:     expected=expected)
  Dec 07 17:45:42.362099 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]:   File "/opt/stack/new/nova/nova/db/sqlalchemy/api.py", line 182, in wrapper
  Dec 07 17:45:42.362205 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]:     return f(*args, **kwargs)
  Dec 07 17:45:42.362411 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]:   File "/usr/local/lib/python2.7/dist-packages/oslo_db/api.py", line 147, in wrapper
  Dec 07 17:45:42.362530 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]:     ectxt.value = e.inner_exc
  Dec 07 17:45:42.362635 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]:   File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
  Dec 07 17:45:42.362747 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]:     self.force_reraise()
  Dec 07 17:45:42.362850 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]:   File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
  Dec 07 17:45:42.362961 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]:     six.reraise(self.type_, self.value, self.tb)
  Dec 07 17:45:42.363065 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]:   File "/usr/local/lib/python2.7/dist-packages/oslo_db/api.py", line 135, in wrapper
  Dec 07 17:45:42.363172 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]:     return f(*args, **kwargs)
  Dec 07 17:45:42.363274 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]:   File "/opt/stack/new/nova/nova/db/sqlalchemy/api.py", line 253, in wrapped
  Dec 07 17:45:42.363377 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]:     return f(context, *args, **kwargs)
  Dec 07 17:45:42.363490 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]:   File "/opt/stack/new/nova/nova/db/sqlalchemy/api.py", line 2736, in instance_update_and_get_original
  Dec 07 17:45:42.363593 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]:     columns_to_join=columns_to_join)
  Dec 07 17:45:42.363700 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]:   File "/opt/stack/new/nova/nova/db/sqlalchemy/api.py", line 1894, in _instance_get_by_uuid
  Dec 07 17:45:42.363806 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]:     raise exception.InstanceNotFound(instance_id=uuid)
  Dec 07 17:45:42.363909 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]: InstanceNotFound: Instance 89568634-e208-4973-b5e6-71fb262c49b5 could not be found.
  Dec 07 17:45:42.364018 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]: ERROR nova.virt.libvirt.driver Traceback (most recent call last):
  Dec 07 17:45:42.364126 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]: ERROR nova.virt.libvirt.driver   File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 1711, in snapshot
  Dec 07 17:45:42.364239 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]: ERROR nova.virt.libvirt.driver     expected_state=task_states.IMAGE_PENDING_UPLOAD)
  Dec 07 17:45:42.364404 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]: ERROR nova.virt.libvirt.driver   File "/opt/stack/new/nova/nova/compute/manager.py", line 3290, in update_task_state
  Dec 07 17:45:42.364519 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]: ERROR nova.virt.libvirt.driver     instance.save(expected_task_state=expected_state)
  Dec 07 17:45:42.364626 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]: ERROR nova.virt.libvirt.driver   File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 210, in wrapper
  Dec 07 17:45:42.364732 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]: ERROR nova.virt.libvirt.driver     ctxt, self, fn.__name__, args, kwargs)
  Dec 07 17:45:42.364842 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]: ERROR nova.virt.libvirt.driver   File "/opt/stack/new/nova/nova/conductor/rpcapi.py", line 245, in object_action
  Dec 07 17:45:42.364952 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]: ERROR nova.virt.libvirt.driver     objmethod=objmethod, args=args, kwargs=kwargs)
  Dec 07 17:45:42.365100 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]: ERROR nova.virt.libvirt.driver   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 174, in call
  Dec 07 17:45:42.365211 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]: ERROR nova.virt.libvirt.driver     retry=self.retry)
  Dec 07 17:45:42.365384 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]: ERROR nova.virt.libvirt.driver   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 131, in _send
  Dec 07 17:45:42.365498 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]: ERROR nova.virt.libvirt.driver     timeout=timeout, retry=retry)
  Dec 07 17:45:42.365606 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]: ERROR nova.virt.libvirt.driver   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 559, in send
  Dec 07 17:45:42.365710 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]: ERROR nova.virt.libvirt.driver     retry=retry)
  Dec 07 17:45:42.365860 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]: ERROR nova.virt.libvirt.driver   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 550, in _send
  Dec 07 17:45:42.365971 ubuntu-xenial-s390x-cloud-zfwcec175-46 nova-compute[11373]: ERROR nova.virt.libvirt.driver     raise result

  This is due to this code in the libvirt driver calling the
  update_task_state callback which updates the instance in the database
  but if the instance has been deleted it raises InstanceNotFound, which
  the libvirt driver isn't handling:

  https://github.com/openstack/nova/blob/81544829d14aebaaa717ffbe6ec33c67a2f11b7f/nova/virt/libvirt/driver.py#L1710

  https://github.com/openstack/nova/blob/81544829d14aebaaa717ffbe6ec33c67a2f11b7f/nova/virt/libvirt/driver.py#L1772

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1737024/+subscriptions


References