← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1905701] Re: Do not recreate libvirt secret when one already exists on the host during a host reboot

 

So this isn't enough by itself to avoid the failure case listed in c#0
as the call to resume_state_on_host_boot in turn calls _hard_reboot that
always deletes the volume secret rendering the optimisation landed above
useless.

It's pretty easy to reproduce this using the demo user account in
devstack:

$  . openrc admin admin
$ openstack volume type create --encryption-provider luks --encryption-cipher aes-xts-plain64 --encryption-key-size 256 --encryption-control-location front-end LUKS

$ . openrc demo demo
$ openstack volume create --size 1 --type luks test
$ openstack server create --image cirros-0.5.1-x86_64-disk --flavor 1 --network private test
$ openstack server add volume test test

$  . openrc admin admin
$ openstack server reboot --hard test
$ openstack server event list f65c96c6-f63f-42b3-8e00-fff5b24daa35
+------------------------------------------+--------------------------------------+---------------+----------------------------+
| Request ID                               | Server ID                            | Action        | Start Time                 |
+------------------------------------------+--------------------------------------+---------------+----------------------------+
| req-d22d8d5a-a090-4f03-a246-a4c4487319aa | f65c96c6-f63f-42b3-8e00-fff5b24daa35 | reboot        | 2021-05-27T09:42:56.000000 |
| req-e8ab2b76-00a4-4c3c-9616-c1437acd17db | f65c96c6-f63f-42b3-8e00-fff5b24daa35 | attach_volume | 2021-05-27T09:41:52.000000 |
| req-2314c5c8-1584-4d7e-9044-78bcececb459 | f65c96c6-f63f-42b3-8e00-fff5b24daa35 | create        | 2021-05-27T09:41:43.000000 |
+------------------------------------------+--------------------------------------+---------------+----------------------------+

$ openstack server event show f65c96c6-f63f-42b3-8e00-fff5b24daa35 req-d22d8d5a-a090-4f03-a246-a4c4487319aa -f json -c events | awk  '{gsub("\\\\n","\n")};1'
{
  "events": [
    {
      "event": "compute_reboot_instance",
      "start_time": "2021-05-27T09:42:56.000000",
      "finish_time": "2021-05-27T09:42:59.000000",
      "result": "Error",
      "traceback": "  File \"/opt/stack/nova/nova/compute/utils.py\", line 1434, in decorated_function
    return function(self, context, *args, **kwargs)
  File \"/opt/stack/nova/nova/compute/manager.py\", line 211, in decorated_function
    compute_utils.add_instance_fault_from_exc(context,
  File \"/usr/local/lib/python3.8/site-packages/oslo_utils/excutils.py\", line 227, in __exit__
    self.force_reraise()
  File \"/usr/local/lib/python3.8/site-packages/oslo_utils/excutils.py\", line 200, in force_reraise
    raise self.value
  File \"/opt/stack/nova/nova/compute/manager.py\", line 200, in decorated_function
    return function(self, context, *args, **kwargs)
  File \"/opt/stack/nova/nova/compute/manager.py\", line 3709, in reboot_instance
    do_reboot_instance(context, instance, block_device_info, reboot_type)
  File \"/usr/local/lib/python3.8/site-packages/oslo_concurrency/lockutils.py\", line 360, in inner
    return f(*args, **kwargs)
  File \"/opt/stack/nova/nova/compute/manager.py\", line 3707, in do_reboot_instance
    self._reboot_instance(context, instance, block_device_info,
  File \"/opt/stack/nova/nova/compute/manager.py\", line 3801, in _reboot_instance
    self._set_instance_obj_error_state(instance)
  File \"/usr/local/lib/python3.8/site-packages/oslo_utils/excutils.py\", line 227, in __exit__
    self.force_reraise()
  File \"/usr/local/lib/python3.8/site-packages/oslo_utils/excutils.py\", line 200, in force_reraise
    raise self.value
  File \"/opt/stack/nova/nova/compute/manager.py\", line 3771, in _reboot_instance
    self.driver.reboot(context, instance,
  File \"/opt/stack/nova/nova/virt/libvirt/driver.py\", line 3659, in reboot
    return self._hard_reboot(context, instance, network_info,
  File \"/opt/stack/nova/nova/virt/libvirt/driver.py\", line 3748, in _hard_reboot
    xml = self._get_guest_xml(context, instance, network_info, disk_info,
  File \"/opt/stack/nova/nova/virt/libvirt/driver.py\", line 6990, in _get_guest_xml
    conf = self._get_guest_config(instance, network_info, image_meta,
  File \"/opt/stack/nova/nova/virt/libvirt/driver.py\", line 6612, in _get_guest_config
    storage_configs = self._get_guest_storage_config(context,
  File \"/opt/stack/nova/nova/virt/libvirt/driver.py\", line 5253, in _get_guest_storage_config
    self._connect_volume(context, connection_info, instance)
  File \"/opt/stack/nova/nova/virt/libvirt/driver.py\", line 1800, in _connect_volume
    vol_driver.disconnect_volume(connection_info, instance)
  File \"/usr/local/lib/python3.8/site-packages/oslo_utils/excutils.py\", line 227, in __exit__
    self.force_reraise()
  File \"/usr/local/lib/python3.8/site-packages/oslo_utils/excutils.py\", line 200, in force_reraise
    raise self.value
  File \"/opt/stack/nova/nova/virt/libvirt/driver.py\", line 1794, in _connect_volume
    self._attach_encryptor(context, connection_info, encryption)
  File \"/opt/stack/nova/nova/virt/libvirt/driver.py\", line 1935, in _attach_encryptor
    key = keymgr.get(context, encryption['encryption_key_id'])
  File \"/usr/local/lib/python3.8/site-packages/castellan/key_manager/migration.py\", line 55, in get
    secret = super(MigrationKeyManager, self).get(
  File \"/usr/local/lib/python3.8/site-packages/castellan/key_manager/barbican_key_manager.py\", line 588, in get
    raise exception.KeyManagerError(reason=e)
"
    }
  ]
}

I'll post a follow up change now and pause the backports for the time
being.

** Also affects: nova/xena
   Importance: Medium
     Assignee: Lee Yarwood (lyarwood)
       Status: In Progress

** Also affects: nova/wallaby
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1905701

Title:
  Do not recreate libvirt secret when one already exists on the host
  during a host reboot

Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) queens series:
  In Progress
Status in OpenStack Compute (nova) rocky series:
  In Progress
Status in OpenStack Compute (nova) stein series:
  In Progress
Status in OpenStack Compute (nova) train series:
  In Progress
Status in OpenStack Compute (nova) ussuri series:
  In Progress
Status in OpenStack Compute (nova) victoria series:
  In Progress
Status in OpenStack Compute (nova) wallaby series:
  New
Status in OpenStack Compute (nova) xena series:
  In Progress

Bug description:
  Description
  ===========

  When [compute]/resume_guests_state_on_host_boot is enabled the compute
  manager will attempt to restart instances on start up.

  When using the libvirt driver and instances with attached LUKSv1
  encrypted volumes a call is made to _attach_encryptor that currently
  assumes that any volume libvirt secrets don't already exist on the
  host. As a result this call will currently lead to an attempt to
  lookup encryption metadata that fails as the compute service is using
  a bare bones local only admin context to drive the restart of the
  instances.

  The libvirt secrets associated with LUKSv1 encrypted volumes actually
  persist a host reboot and thus this call to fetch encryption metadata,
  fetch the symmetric key etc are not required. Removal of these calls
  in this context should allow the compute service to start instances
  with these volumes attached.

  Steps to reproduce
  ==================
  * Enable [compute]/resume_guests_state_on_host_boot
  * Launch instances with encrypted LUKSv1 volumes attached
  * Reboot the underlying host

  Expected result
  ===============
  * The instances are restarted successfully by Nova as no external calls are made and the existing libvirt secret for any encrypted LUKSv1 volumes are reused.

  Actual result
  =============
  * The instances fail to restart as the initial calls made by the Nova service use an empty admin context without a service catelog etc.

  Environment
  ===========
  1. Exact version of OpenStack you are running. See the following

     master

  2. Which hypervisor did you use?
     (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
     What's the version of that?

     libvirt + QEMU/KVM

  2. Which storage type did you use?
     (For example: Ceph, LVM, GPFS, ...)
     What's the version of that?

     N/A

  3. Which networking type did you use?
     (For example: nova-network, Neutron with OpenVSwitch, ...)

     N/A

  Logs & Configs
  ==============

  2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf]   File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 1641, in _connect_volume
  2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf]     self._attach_encryptor(context, connection_info, encryption)
  2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf]   File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 1760, in _attach_encryptor
  2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf]     key = keymgr.get(context, encryption['encryption_key_id'])
  2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf]   File "/usr/lib/python3.6/site-packages/castellan/key_manager/barbican_key_manager.py", line 575, in get
  2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf]     secret = self._get_secret(context, managed_object_id)
  2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf]   File "/usr/lib/python3.6/site-packages/castellan/key_manager/barbican_key_manager.py", line 545, in _ge
  t_secret
  2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf]     barbican_client = self._get_barbican_client(context)
  2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf]   File "/usr/lib/python3.6/site-packages/castellan/key_manager/barbican_key_manager.py", line 142, in _ge
  t_barbican_client
  2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf]     self._barbican_endpoint)
  2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf]   File "/usr/lib/python3.6/site-packages/castellan/key_manager/barbican_key_manager.py", line 214, in _cr
  eate_base_url
  2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf]     service_type='key-manager')
  2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf]   File "/usr/lib/python3.6/site-packages/keystoneauth1/access/service_catalog.py", line 425, in endpoint_
  data_for
  2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf]     raise exceptions.EmptyCatalog('The service catalog is empty.')
  2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf] keystoneauth1.exceptions.catalog.EmptyCatalog: The service catalog is empty.
  2020-08-20 11:30:12.273 7 ERROR nova.virt.libvirt.driver [instance: c5b3e7d4-99ea-409c-aba6-d32751f93ccf]

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1905701/+subscriptions


References