← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1764460] [NEW] Cannot hard reboot an instance in error state

 

Public bug reported:

Nova version: stable/queens  fda768b304e05821f7479f9698c59d18bf3d3516
Hypervisor: Libvirt + KVM

If an instance doesn't exist in libvirt (failed live migration, compute
container rebuilt, etc) a hard reboot or start is no longer able to
recreate it. We see this problem occasionally happen for various reasons
and in the past a hard reboot would revive the instance.

A recent commit is responsible (libvirt: pass the mdevs when rebooting
the guest).

_get_all_assigned_mediated_devices() throws an instanceNotFound
exception when trying to start such an instance.

Adding a instance_exists() check solves the issue.

--- driver.py.orig      2018-04-16 16:11:42.865555972 +0000
+++ driver.py   2018-04-16 16:11:55.901773724 +0000
@@ -5966,6 +5966,8 @@
         """
         allocated_mdevs = {}
         if instance:
+            if not self.instance_exists(instance):
+                return {}
             guest = self._host.get_guest(instance)
             guests = [guest]
         else:

Steps to recreate:
1. Stop an instance
2. Delete the instance-XXXXXXX.xml file from /etc/libvirt/qemu/
3. Start the instance

Expected result: instance running
Actual result: error: instanceNotFound from nova-compute

Logs:
2018-04-16 15:41:09.756 2030272 INFO nova.compute.manager [req-ce2e1036-ab7b-4a98-b343-6ab748326963 32bab887a38f4b6cbcaf83297d4b7812 29e87d21ad14403bb789543e8bc0dab7 - default default] [instance: 0130afdf-f5aa-4ec9-8d0a-71080c70f276] Successfully reverted task state from powering-on on failure for instance.
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server [req-ce2e1036-ab7b-4a98-b343-6ab748326963 32bab887a38f4b6cbcaf83297d4b7812 29e87d21ad14403bb789543e8bc0dab7 - default default] Exception during message handling: InstanceNotFound: Instance 0130afdf-f5aa-4ec9-8d0a-71080c70f276 could not be found.
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 163, in _process_incoming
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server     res = self.dispatcher.dispatch(message)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 220, in dispatch
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server     return self._do_dispatch(endpoint, method, ctxt, args)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 190, in _do_dispatch
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server     result = func(ctxt, **new_args)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/exception_wrapper.py", line 76, in wrapped
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server     function_name, call_dict, binary)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server     self.force_reraise()
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server     six.reraise(self.type_, self.value, self.tb)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/exception_wrapper.py", line 67, in wrapped
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server     return f(self, context, *args, **kw)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/compute/manager.py", line 186, in decorated_function
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server     "Error: %s", e, instance=instance)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server     self.force_reraise()
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server     six.reraise(self.type_, self.value, self.tb)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/compute/manager.py", line 156, in decorated_function
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server     return function(self, context, *args, **kwargs)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/compute/utils.py", line 976, in decorated_function
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server     return function(self, context, *args, **kwargs)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/compute/manager.py", line 202, in decorated_function
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server     return function(self, context, *args, **kwargs)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/compute/manager.py", line 2665, in start_instance
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server     self._power_on(context, instance)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/compute/manager.py", line 2635, in _power_on
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server     block_device_info)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2908, in power_on
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server     self._hard_reboot(context, instance, network_info, block_device_info)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2745, in _hard_reboot
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server     mdevs = self._get_all_assigned_mediated_devices(instance)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5969, in _get_all_assigned_mediated_devices
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server     guest = self._host.get_guest(instance)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/virt/libvirt/host.py", line 526, in get_guest
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server     return libvirt_guest.Guest(self._get_domain(instance))
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/virt/libvirt/host.py", line 546, in _get_domain
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server     raise exception.InstanceNotFound(instance_id=instance.uuid)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server InstanceNotFound: Instance 0130afdf-f5aa-4ec9-8d0a-71080c70f276 could not be found.

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1764460

Title:
  Cannot hard reboot an instance in error state

Status in OpenStack Compute (nova):
  New

Bug description:
  Nova version: stable/queens  fda768b304e05821f7479f9698c59d18bf3d3516
  Hypervisor: Libvirt + KVM

  If an instance doesn't exist in libvirt (failed live migration,
  compute container rebuilt, etc) a hard reboot or start is no longer
  able to recreate it. We see this problem occasionally happen for
  various reasons and in the past a hard reboot would revive the
  instance.

  A recent commit is responsible (libvirt: pass the mdevs when rebooting
  the guest).

  _get_all_assigned_mediated_devices() throws an instanceNotFound
  exception when trying to start such an instance.

  Adding a instance_exists() check solves the issue.

  --- driver.py.orig      2018-04-16 16:11:42.865555972 +0000
  +++ driver.py   2018-04-16 16:11:55.901773724 +0000
  @@ -5966,6 +5966,8 @@
           """
           allocated_mdevs = {}
           if instance:
  +            if not self.instance_exists(instance):
  +                return {}
               guest = self._host.get_guest(instance)
               guests = [guest]
           else:

  Steps to recreate:
  1. Stop an instance
  2. Delete the instance-XXXXXXX.xml file from /etc/libvirt/qemu/
  3. Start the instance

  Expected result: instance running
  Actual result: error: instanceNotFound from nova-compute

  Logs:
  2018-04-16 15:41:09.756 2030272 INFO nova.compute.manager [req-ce2e1036-ab7b-4a98-b343-6ab748326963 32bab887a38f4b6cbcaf83297d4b7812 29e87d21ad14403bb789543e8bc0dab7 - default default] [instance: 0130afdf-f5aa-4ec9-8d0a-71080c70f276] Successfully reverted task state from powering-on on failure for instance.
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server [req-ce2e1036-ab7b-4a98-b343-6ab748326963 32bab887a38f4b6cbcaf83297d4b7812 29e87d21ad14403bb789543e8bc0dab7 - default default] Exception during message handling: InstanceNotFound: Instance 0130afdf-f5aa-4ec9-8d0a-71080c70f276 could not be found.
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 163, in _process_incoming
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server     res = self.dispatcher.dispatch(message)
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 220, in dispatch
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server     return self._do_dispatch(endpoint, method, ctxt, args)
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 190, in _do_dispatch
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server     result = func(ctxt, **new_args)
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/exception_wrapper.py", line 76, in wrapped
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server     function_name, call_dict, binary)
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server     self.force_reraise()
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server     six.reraise(self.type_, self.value, self.tb)
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/exception_wrapper.py", line 67, in wrapped
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server     return f(self, context, *args, **kw)
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/compute/manager.py", line 186, in decorated_function
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server     "Error: %s", e, instance=instance)
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server     self.force_reraise()
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server     six.reraise(self.type_, self.value, self.tb)
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/compute/manager.py", line 156, in decorated_function
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server     return function(self, context, *args, **kwargs)
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/compute/utils.py", line 976, in decorated_function
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server     return function(self, context, *args, **kwargs)
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/compute/manager.py", line 202, in decorated_function
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server     return function(self, context, *args, **kwargs)
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/compute/manager.py", line 2665, in start_instance
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server     self._power_on(context, instance)
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/compute/manager.py", line 2635, in _power_on
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server     block_device_info)
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2908, in power_on
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server     self._hard_reboot(context, instance, network_info, block_device_info)
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2745, in _hard_reboot
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server     mdevs = self._get_all_assigned_mediated_devices(instance)
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5969, in _get_all_assigned_mediated_devices
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server     guest = self._host.get_guest(instance)
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/virt/libvirt/host.py", line 526, in get_guest
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server     return libvirt_guest.Guest(self._get_domain(instance))
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/virt/libvirt/host.py", line 546, in _get_domain
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server     raise exception.InstanceNotFound(instance_id=instance.uuid)
  2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server InstanceNotFound: Instance 0130afdf-f5aa-4ec9-8d0a-71080c70f276 could not be found.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1764460/+subscriptions


Follow ups