yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #72377
[Bug 1764460] [NEW] Cannot hard reboot an instance in error state
Public bug reported:
Nova version: stable/queens fda768b304e05821f7479f9698c59d18bf3d3516
Hypervisor: Libvirt + KVM
If an instance doesn't exist in libvirt (failed live migration, compute
container rebuilt, etc) a hard reboot or start is no longer able to
recreate it. We see this problem occasionally happen for various reasons
and in the past a hard reboot would revive the instance.
A recent commit is responsible (libvirt: pass the mdevs when rebooting
the guest).
_get_all_assigned_mediated_devices() throws an instanceNotFound
exception when trying to start such an instance.
Adding a instance_exists() check solves the issue.
--- driver.py.orig 2018-04-16 16:11:42.865555972 +0000
+++ driver.py 2018-04-16 16:11:55.901773724 +0000
@@ -5966,6 +5966,8 @@
"""
allocated_mdevs = {}
if instance:
+ if not self.instance_exists(instance):
+ return {}
guest = self._host.get_guest(instance)
guests = [guest]
else:
Steps to recreate:
1. Stop an instance
2. Delete the instance-XXXXXXX.xml file from /etc/libvirt/qemu/
3. Start the instance
Expected result: instance running
Actual result: error: instanceNotFound from nova-compute
Logs:
2018-04-16 15:41:09.756 2030272 INFO nova.compute.manager [req-ce2e1036-ab7b-4a98-b343-6ab748326963 32bab887a38f4b6cbcaf83297d4b7812 29e87d21ad14403bb789543e8bc0dab7 - default default] [instance: 0130afdf-f5aa-4ec9-8d0a-71080c70f276] Successfully reverted task state from powering-on on failure for instance.
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server [req-ce2e1036-ab7b-4a98-b343-6ab748326963 32bab887a38f4b6cbcaf83297d4b7812 29e87d21ad14403bb789543e8bc0dab7 - default default] Exception during message handling: InstanceNotFound: Instance 0130afdf-f5aa-4ec9-8d0a-71080c70f276 could not be found.
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 163, in _process_incoming
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 220, in dispatch
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 190, in _do_dispatch
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/exception_wrapper.py", line 76, in wrapped
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server function_name, call_dict, binary)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server self.force_reraise()
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/exception_wrapper.py", line 67, in wrapped
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server return f(self, context, *args, **kw)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/compute/manager.py", line 186, in decorated_function
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server "Error: %s", e, instance=instance)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server self.force_reraise()
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/compute/manager.py", line 156, in decorated_function
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/compute/utils.py", line 976, in decorated_function
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/compute/manager.py", line 202, in decorated_function
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/compute/manager.py", line 2665, in start_instance
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server self._power_on(context, instance)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/compute/manager.py", line 2635, in _power_on
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server block_device_info)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2908, in power_on
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server self._hard_reboot(context, instance, network_info, block_device_info)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2745, in _hard_reboot
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server mdevs = self._get_all_assigned_mediated_devices(instance)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5969, in _get_all_assigned_mediated_devices
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server guest = self._host.get_guest(instance)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/virt/libvirt/host.py", line 526, in get_guest
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server return libvirt_guest.Guest(self._get_domain(instance))
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/virt/libvirt/host.py", line 546, in _get_domain
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server raise exception.InstanceNotFound(instance_id=instance.uuid)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server InstanceNotFound: Instance 0130afdf-f5aa-4ec9-8d0a-71080c70f276 could not be found.
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1764460
Title:
Cannot hard reboot an instance in error state
Status in OpenStack Compute (nova):
New
Bug description:
Nova version: stable/queens fda768b304e05821f7479f9698c59d18bf3d3516
Hypervisor: Libvirt + KVM
If an instance doesn't exist in libvirt (failed live migration,
compute container rebuilt, etc) a hard reboot or start is no longer
able to recreate it. We see this problem occasionally happen for
various reasons and in the past a hard reboot would revive the
instance.
A recent commit is responsible (libvirt: pass the mdevs when rebooting
the guest).
_get_all_assigned_mediated_devices() throws an instanceNotFound
exception when trying to start such an instance.
Adding a instance_exists() check solves the issue.
--- driver.py.orig 2018-04-16 16:11:42.865555972 +0000
+++ driver.py 2018-04-16 16:11:55.901773724 +0000
@@ -5966,6 +5966,8 @@
"""
allocated_mdevs = {}
if instance:
+ if not self.instance_exists(instance):
+ return {}
guest = self._host.get_guest(instance)
guests = [guest]
else:
Steps to recreate:
1. Stop an instance
2. Delete the instance-XXXXXXX.xml file from /etc/libvirt/qemu/
3. Start the instance
Expected result: instance running
Actual result: error: instanceNotFound from nova-compute
Logs:
2018-04-16 15:41:09.756 2030272 INFO nova.compute.manager [req-ce2e1036-ab7b-4a98-b343-6ab748326963 32bab887a38f4b6cbcaf83297d4b7812 29e87d21ad14403bb789543e8bc0dab7 - default default] [instance: 0130afdf-f5aa-4ec9-8d0a-71080c70f276] Successfully reverted task state from powering-on on failure for instance.
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server [req-ce2e1036-ab7b-4a98-b343-6ab748326963 32bab887a38f4b6cbcaf83297d4b7812 29e87d21ad14403bb789543e8bc0dab7 - default default] Exception during message handling: InstanceNotFound: Instance 0130afdf-f5aa-4ec9-8d0a-71080c70f276 could not be found.
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 163, in _process_incoming
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 220, in dispatch
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 190, in _do_dispatch
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/exception_wrapper.py", line 76, in wrapped
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server function_name, call_dict, binary)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server self.force_reraise()
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/exception_wrapper.py", line 67, in wrapped
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server return f(self, context, *args, **kw)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/compute/manager.py", line 186, in decorated_function
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server "Error: %s", e, instance=instance)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server self.force_reraise()
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/compute/manager.py", line 156, in decorated_function
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/compute/utils.py", line 976, in decorated_function
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/compute/manager.py", line 202, in decorated_function
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/compute/manager.py", line 2665, in start_instance
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server self._power_on(context, instance)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/compute/manager.py", line 2635, in _power_on
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server block_device_info)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2908, in power_on
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server self._hard_reboot(context, instance, network_info, block_device_info)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2745, in _hard_reboot
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server mdevs = self._get_all_assigned_mediated_devices(instance)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5969, in _get_all_assigned_mediated_devices
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server guest = self._host.get_guest(instance)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/virt/libvirt/host.py", line 526, in get_guest
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server return libvirt_guest.Guest(self._get_domain(instance))
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/virt/libvirt/host.py", line 546, in _get_domain
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server raise exception.InstanceNotFound(instance_id=instance.uuid)
2018-04-16 15:41:09.790 2030272 ERROR oslo_messaging.rpc.server InstanceNotFound: Instance 0130afdf-f5aa-4ec9-8d0a-71080c70f276 could not be found.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1764460/+subscriptions
Follow ups