yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #92413
[Bug 2020699] [NEW] Nova's rescue and unrescue assumes os-brick connect_volume is idempotent
Public bug reported:
The rescue and unrescue operations in Nova assume that calls to
`connect_volume` in os-brick are idempotent which it's currently true,
but it was not something we guaranteed in os-brick.
With the recent CVE [1][2] we realized that os-brick cannot assume on
the `connect_volume` that if there is a device/s present for the
provided connection information then it is the right volume, and even if
it's the right volume it cannot assume that it has the right information
in sysfs (like the volume size), so it needs to clean things up to the
best of its ability before actually connecting, and just in case it
needs to confirm just before returning a patch to the caller that the
device it's going to return is actually correct and consistent (as in
the multipath only has devices with the same size and SCSI ID).
This means that os-brick's `connect_volume` will no longer be idempotent
by design once this patch [3] merges to prevent data leak in any corner
cases.
This will break the rescue and unrescue nova operations, because on the
rescue call it stashes the original XML [4] and then unstashes it on
unrescue [5], but in between Nova calls `connect_volume` for the rescue
instance, effectively disconnecting the original device path.
This means that reusing that original path either points to a non-
existent device or to a volume of another instance.
We can see an example of the non-existent device case in the failed CI
job [6] where test
`tempest.api.compute.servers.test_server_rescue.ServerStableDeviceRescueTest.test_stable_device_rescue_disk_virtio_with_volume_attached`
fails with a nova-compute error [7]:
libvirt.libvirtError: Cannot access storage file '/dev/sdd': No such
file or directory
[1]: https://nvd.nist.gov/vuln/detail/CVE-2023-2088
[2]: https://bugs.launchpad.net/nova/+bug/2004555
[3]: https://review.opendev.org/c/openstack/os-brick/+/882841
[4]:
https://github.com/openstack/nova/blob/71b105a4cfea054827e09b5b8df6be845909275a/nova/virt/libvirt/driver.py#L4229-L4232
[5]:
https://github.com/openstack/nova/blob/71b105a4cfea054827e09b5b8df6be845909275a/nova/virt/libvirt/driver.py#L4323-L4328
[6]: https://a30336fa6a8fca5c6dba-
fe779e5654b21fdff79727b204dfb7d6.ssl.cf1.rackcdn.com/882841/3/check/os-
brick-src-tempest-lvm-lio-barbican/8ef7adf/testr_results.html
[7]:
https://zuul.opendev.org/t/openstack/build/8ef7adf6a82248d8b9f94eb5b5bba73c/log/controller/logs/screen-
n-cpu.txt?severity=4#77239
** Affects: nova
Importance: High
Status: Triaged
** Tags: cinder libvirt rescue volumes
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2020699
Title:
Nova's rescue and unrescue assumes os-brick connect_volume is
idempotent
Status in OpenStack Compute (nova):
Triaged
Bug description:
The rescue and unrescue operations in Nova assume that calls to
`connect_volume` in os-brick are idempotent which it's currently true,
but it was not something we guaranteed in os-brick.
With the recent CVE [1][2] we realized that os-brick cannot assume on
the `connect_volume` that if there is a device/s present for the
provided connection information then it is the right volume, and even
if it's the right volume it cannot assume that it has the right
information in sysfs (like the volume size), so it needs to clean
things up to the best of its ability before actually connecting, and
just in case it needs to confirm just before returning a patch to the
caller that the device it's going to return is actually correct and
consistent (as in the multipath only has devices with the same size
and SCSI ID).
This means that os-brick's `connect_volume` will no longer be
idempotent by design once this patch [3] merges to prevent data leak
in any corner cases.
This will break the rescue and unrescue nova operations, because on
the rescue call it stashes the original XML [4] and then unstashes it
on unrescue [5], but in between Nova calls `connect_volume` for the
rescue instance, effectively disconnecting the original device path.
This means that reusing that original path either points to a non-
existent device or to a volume of another instance.
We can see an example of the non-existent device case in the failed CI
job [6] where test
`tempest.api.compute.servers.test_server_rescue.ServerStableDeviceRescueTest.test_stable_device_rescue_disk_virtio_with_volume_attached`
fails with a nova-compute error [7]:
libvirt.libvirtError: Cannot access storage file '/dev/sdd': No such
file or directory
[1]: https://nvd.nist.gov/vuln/detail/CVE-2023-2088
[2]: https://bugs.launchpad.net/nova/+bug/2004555
[3]: https://review.opendev.org/c/openstack/os-brick/+/882841
[4]:
https://github.com/openstack/nova/blob/71b105a4cfea054827e09b5b8df6be845909275a/nova/virt/libvirt/driver.py#L4229-L4232
[5]:
https://github.com/openstack/nova/blob/71b105a4cfea054827e09b5b8df6be845909275a/nova/virt/libvirt/driver.py#L4323-L4328
[6]: https://a30336fa6a8fca5c6dba-
fe779e5654b21fdff79727b204dfb7d6.ssl.cf1.rackcdn.com/882841/3/check/os-
brick-src-tempest-lvm-lio-barbican/8ef7adf/testr_results.html
[7]:
https://zuul.opendev.org/t/openstack/build/8ef7adf6a82248d8b9f94eb5b5bba73c/log/controller/logs/screen-
n-cpu.txt?severity=4#77239
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2020699/+subscriptions