yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #61176
[Bug 1660647] Re: _cleanup_failed_start aggressively removes local instance files when handling plug_vif failures
Reviewed: https://review.openstack.org/427267
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=67aa277b4ef623c9877b97bfd7952f0bb1d80a81
Submitter: Jenkins
Branch: master
commit 67aa277b4ef623c9877b97bfd7952f0bb1d80a81
Author: Lee Yarwood <lyarwood@xxxxxxxxxx>
Date: Tue Jan 31 15:26:38 2017 +0000
libvirt: Limit destroying disks during cleanup to spawn
Iab5afdf1b5b now ensures that cleanup is always called when VIF plugging
errors are encountered by _create_domain_and_network. At present cleanup
is always called with destroy_disks=True leading to any local instance
files being removed from the host.
_create_domain_and_network itself has various callers such as resume and
hard_reboot that assume these files will persist any such failures. As a
result the removal of these files will leave instances in an unbootable
state.
In order to correct this an additional destroy_disks_on_failures kwarg
is now provided to _create_domain_and_network and passed down into
cleanup. This kwarg defaults to False and is only enabled when
_create_domain_and_network is used to spawn a new instance.
Closes-bug: #1660647
Change-Id: I38c969690fedb71c5b5ec4418c1b0dd53df733ec
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1660647
Title:
_cleanup_failed_start aggressively removes local instance files when
handling plug_vif failures
Status in OpenStack Compute (nova):
Fix Released
Bug description:
Description
===========
Iab5afdf1b5b8d107ea0e5895c24d50712e7dc7b1 [1] ensured that _cleanup_failed_start is always called if we encounter VIF plugging failures in _create_domain_and_network. However this currently leads to any local instance files being removed as cleanup is called with destroy_disks=True.
As such any failures when resuming or restarting an instance will lead
to these files being removed and the instance left in an unbootable
state. IMHO these files should only be removed when cleaning up after
errors hit while initially spawning an instance.
Steps to reproduce
==================
- Boot an instance using local disks
- Stop the instance
- Start the instance, causing a timeout or other failure during plug_vifs
- Attempt to start the instance again
Expected result
===============
The local instance files are left on the host if instances are rebooting or resuming.
Actual result
=============
The local instance files are removed from the host if _cleanup_failed_start is called.
Environment
===========
1. Exact version of OpenStack you are running. See the following
list for all releases: http://docs.openstack.org/releases/
$ pwd
/opt/stack/nova
$ git rev-parse HEAD
42222969a21ee28ef4a68bd5ab1ec8a12c4ad126
2. Which hypervisor did you use?
(For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
What's the version of that?
Libvirt + KVM
2. Which storage type did you use?
(For example: Ceph, LVM, GPFS, ...)
What's the version of that?
N/A
3. Which networking type did you use?
(For example: nova-network, Neutron with OpenVSwitch, ...)
N/A
Logs & Configs
==============
$ nova boot --image cirros-0.3.4-x86_64-uec --flavor 1 test-boot
[..]
$ nova stop test-boot
$ ll ../data/nova/instances/be6cb386-e005-4fb2-8332-7e0c375ee452/
total 18596
-rw-rw-r--. 1 root root 16699 Jan 31 09:30 console.log
-rw-r--r--. 1 root root 10289152 Jan 31 09:30 disk
-rw-r--r--. 1 stack libvirtd 257 Jan 31 09:29 disk.info
-rw-rw-r--. 1 qemu qemu 4979632 Jan 31 09:29 kernel
-rw-rw-r--. 1 qemu qemu 3740163 Jan 31 09:29 ramdisk
I used the following change to artificially recreate an issue plugging
the VIFs :
$ git diff
diff --git a/nova/virt/libvirt/driver.py b/nova/virt/libvirt/driver.py
index 33e3157..248e960 100644
--- a/nova/virt/libvirt/driver.py
+++ b/nova/virt/libvirt/driver.py
@@ -5015,6 +5015,7 @@ class LibvirtDriver(driver.ComputeDriver):
pause = bool(events)
guest = None
try:
+ raise exception.VirtualInterfaceCreateException()
with self.virtapi.wait_for_instance_event(
instance, events, deadline=timeout,
error_callback=self._neutron_failed_callback):
$ nova start test-boot
Request to start server test-boot has been accepted.
$ nova list
+--------------------------------------+-----------+---------+------------+-------------+--------------------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+-----------+---------+------------+-------------+--------------------------------+
| be6cb386-e005-4fb2-8332-7e0c375ee452 | test-boot | SHUTOFF | - | Shutdown | public=172.24.4.8, 2001:db8::9 |
+--------------------------------------+-----------+---------+------------+-------------+--------------------------------+
$ ll ../data/nova/instances/be6cb386-e005-4fb2-8332-7e0c375ee452/
ls: cannot access '../data/nova/instances/be6cb386-e005-4fb2-8332-7e0c375ee452/': No such file or directory
Future attempts to start the instance will fail as a result :
$ nova start test-boot
Request to start server test-boot has been accepted.
$ vi ../logs/n-cpu.log
[..]
5353 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server Traceback (most recent call last):
5354 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 155, in _process_incoming
5355 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
5356 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 222, in dispatch
5357 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
5358 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 192, in _do_dispatch
5359 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server result = func(ctxt, **new_args)
5360 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/exception_wrapper.py", line 75, in wrapped
5361 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server function_name, call_dict, binary)
5362 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
5363 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server self.force_reraise()
5364 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
5365 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
5366 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/exception_wrapper.py", line 66, in wrapped
5367 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server return f(self, context, *args, **kw)
5368 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 188, in decorated_function
5369 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server LOG.warning(msg, e, instance=instance)
5370 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
5371 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server self.force_reraise()
5372 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
5373 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
5374 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 157, in decorated_function
5375 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
5376 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/utils.py", line 685, in decorated_function
5377 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
5378 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 216, in decorated_function
5379 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server kwargs['instance'], e, sys.exc_info())
5380 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
5381 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server self.force_reraise()
5382 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
5383 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
5384 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 204, in decorated_function
5385 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
5386 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 2524, in start_instance
5387 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server self._power_on(context, instance)
5388 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 2494, in _power_on
5389 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server block_device_info)
5390 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2494, in power_on
5391 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server self._hard_reboot(context, instance, network_info, block_device_info)
5392 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2373, in _hard_reboot
5393 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server block_device_info)
5394 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6931, in _get_instance_disk_info
5395 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server dk_size = int(os.path.getsize(path))
5396 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server File "/usr/lib64/python2.7/genericpath.py", line 57, in getsize
5397 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server return os.stat(filename).st_size
5398 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server OSError: [Errno 2] No such file or directory: '/opt/stack/data/nova/instances/be6cb386-e005-4fb2-8332-7e0c375ee452/disk'
5399 2017-01-31 09:35:59.117 TRACE oslo_messaging.rpc.server
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1660647/+subscriptions
References