yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #69740
[Bug 1724573] Re: encrypted volumes are directly attached to instances after a compute host reboot
Reviewed: https://review.openstack.org/400384
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3f8daf080411b84ec0669f0642524ce8a7d19057
Submitter: Zuul
Branch: master
commit 3f8daf080411b84ec0669f0642524ce8a7d19057
Author: Lee Yarwood <lyarwood@xxxxxxxxxx>
Date: Mon Nov 21 15:29:30 2016 +0000
libvirt: Re-initialise volumes, encryptors, and vifs on hard reboot
We call _hard_reboot during reboot, power_on, and
resume_state_on_host_boot. It functions essentially by tearing as much
of an instance as possible before recreating it, which additionally
makes it useful to operators for attempting automated recovery of
instances in an inconsistent state.
The Libvirt driver would previously only call _destroy and
_undefine_domain when hard rebooting an instance. This would leave vifs
plugged, volumes connected, and encryptors attached on the host. It
also means that when we try to restart the instance, we assume all
these things are correctly configured. If they are not, the instance
may fail to start at all, or may be incorrectly configured when
starting.
For example, consider an instance with an encrypted volume after a
compute host reboot. When we attempt to start the instance, power_on
will call _hard_reboot. The volume will be coincidentally re-attached
as a side-effect of calling _get_guest_xml(!), but when we call
_create_domain_and_network we pass reboot=True, which tells it not to
reattach the encryptor, as it is assumed to be already attached. We
are therefore left presenting the encrypted volume data directly to
the instance without decryption.
The approach in this patch is to ensure we recreate the instance as
fully as possible during hard reboot. This means not passing
vifs_already_plugged and reboot to _create_domain_and_network, which
in turn requires that we fully destroy the instance first. This
addresses the specific problem given in the example, but also a whole
class of potential volume and vif related issues of inconsistent
state.
Because we now always tear down volumes, encryptors, and vifs, we are
relying on the tear down of these things to be idempotent. This
highlighted that detach of the luks and cryptsetup encryptors were not
idempotent. We depend on the fixes for those os-brick drivers.
Depends-On: I31d72357c89db53a147c2d986a28c9c6870efad0
Depends-On: I9f52f89b8466d03699cfd5c0e32c672c934cd6fb
Closes-bug: #1724573
Change-Id: Id188d48609f3d22d14e16c7f6114291d547a8986
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1724573
Title:
encrypted volumes are directly attached to instances after a compute
host reboot
Status in OpenStack Compute (nova):
Fix Released
Bug description:
Description
===========
Encrypted volumes are directly attached to instances after a compute
host reboot. These volumes should be decrypted by the os-brick
encryptors that provide libvirt with decrypted dm devices for use by
the instance/domain.
This is due to the following encryptor.attach_volume call being
skipped in the _hard_reboot where reboot=True as it is assumed the dm
devices are already present on the host:
5204 def _create_domain_and_network(self, context, xml, instance, network_info,
5205 block_device_info=None,
5206 power_on=True, reboot=False,
5207 vifs_already_plugged=False,
5208 post_xml_callback=None,
5209 destroy_disks_on_failure=False):
[..]
5218 if (not reboot and 'data' in connection_info and
5219 'volume_id' in connection_info['data']):
5220 volume_id = connection_info['data']['volume_id']
5221 encryption = encryptors.get_encryption_metadata(
5222 context, self._volume_api, volume_id, connection_info)
5223
5224 if encryption:
5225 encryptor = self._get_volume_encryptor(connection_info,
5226 encryption)
5227 encryptor.attach_volume(context, **encryption)
Steps to reproduce
==================
- Create an instance with an attached encrypted volume:
$ cinder type-create LUKS
$ cinder encryption-type-create --cipher aes-xts-plain64 --key_size 512 --control_location front-end LUKS nova.volume.encryptors.luks.LuksEncryptor
$ cinder create --display-name 'encrypted volume' --volume-type LUKS 1
$ nova boot --image cirros-0.3.5-x86_64-disk --flavor 1 test
$ nova volume-attach c762ef8d-13ab-4aee-bd20-c6a002bdd172 3f2cfdf2-11d7-4ac7-883a-76217136f751
- Before continuing note that the instance is connected to the
decrypted dm device:
$ sudo virsh domblklist c762ef8d-13ab-4aee-bd20-c6a002bdd172
Target Source
------------------------------------------------
vda /opt/stack/data/nova/instances/c762ef8d-13ab-4aee-bd20-c6a002bdd172/disk
vdb /dev/disk/by-id/scsi-360014054c6bbc8645494397ad372e0e6
$ ll /dev/disk/by-id/scsi-360014054c6bbc8645494397ad372e0e6
lrwxrwxrwx. 1 root root 56 Oct 18 08:28 /dev/disk/by-id/scsi-360014054c6bbc8645494397ad372e0e6 -> /dev/mapper/crypt-scsi-360014054c6bbc8645494397ad372e0e6
- Restart the n-cpu host _or_ fake a host reset by stopping the n-cpu
service, destroying the domain, removing the decrypted dm device,
unlinking the volume path before finally restarting n-cpu:
$ sudo systemctl stop devstack@n-cpu
$ sudo virsh destroy c762ef8d-13ab-4aee-bd20-c6a002bdd172
$ sudo cryptsetup luksClose /dev/disk/by-id/scsi-360014054c6bbc8645494397ad372e0e6
$ sudo unlink /dev/disk/by-id/scsi-360014054c6bbc8645494397ad372e0e6
$ sudo systemctl start devstack@n-cpu
- The instance should be SHUTDOWN after n-cpu starts up again. So
start the instance:
$ nova start c762ef8d-13ab-4aee-bd20-c6a002bdd172
- The instance is restarted but now points at the original encrypted
block device:
$ sudo virsh domblklist c762ef8d-13ab-4aee-bd20-c6a002bdd172
Target Source
------------------------------------------------
vda /opt/stack/data/nova/instances/c762ef8d-13ab-4aee-bd20-c6a002bdd172/disk
vdb /dev/disk/by-id/scsi-360014054c6bbc8645494397ad372e0e6
$ ll /dev/disk/by-id/scsi-360014054c6bbc8645494397ad372e0e6
lrwxrwxrwx. 1 root root 9 Oct 18 08:32 /dev/disk/by-id/scsi-360014054c6bbc8645494397ad372e0e6 -> ../../sde
- Additional stop and start requests will not correct this:
$ nova stop c762ef8d-13ab-4aee-bd20-c6a002bdd172
$ nova start c762ef8d-13ab-4aee-bd20-c6a002bdd172
$ sudo virsh domblklist c762ef8d-13ab-4aee-bd20-c6a002bdd172
Target Source
------------------------------------------------
vda /opt/stack/data/nova/instances/c762ef8d-13ab-4aee-bd20-c6a002bdd172/disk
vdb /dev/disk/by-id/scsi-360014054c6bbc8645494397ad372e0e6
$ ll /dev/disk/by-id/scsi-360014054c6bbc8645494397ad372e0e6
lrwxrwxrwx. 1 root root 9 Oct 18 08:32 /dev/disk/by-id/scsi-360014054c6bbc8645494397ad372e0e6 -> ../../sde
Expected result
===============
The decrypted volume is attached to the instance once it is restarted.
Actual result
=============
The encrypted volume is attached to the instance once it is restarted.
Environment
===========
1. Exact version of OpenStack you are running. See the following
list for all releases: http://docs.openstack.org/releases/
# git rev-parse HEAD
fce56ce8c04b20174cd89dfbc2c06f0068324b55
2. Which hypervisor did you use?
(For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
What's the version of that?
Libvirt + KVM
2. Which storage type did you use?
(For example: Ceph, LVM, GPFS, ...)
What's the version of that?
LVM+iSCSI
3. Which networking type did you use?
(For example: nova-network, Neutron with OpenVSwitch, ...)
N/A
Logs & Configs
==============
See above.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1724573/+subscriptions
References