← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2004555] Re: [OSSA-2023-003] Unauthorized volume access through deleted volume attachments (CVE-2023-2088)

 

Reviewed:  https://review.opendev.org/c/openstack/cinder/+/882835
Committed: https://opendev.org/openstack/cinder/commit/6df1839bdf288107c600b3e53dff7593a6d4c161
Submitter: "Zuul (22348)"
Branch:    master

commit 6df1839bdf288107c600b3e53dff7593a6d4c161
Author: Gorka Eguileor <geguileo@xxxxxxxxxx>
Date:   Thu Feb 16 15:57:15 2023 +0100

    Reject unsafe delete attachment calls
    
    Due to how the Linux SCSI kernel driver works there are some storage
    systems, such as iSCSI with shared targets, where a normal user can
    access other projects' volume data connected to the same compute host
    using the attachments REST API.
    
    This affects both single and multi-pathed connections.
    
    To prevent users from doing this, unintentionally or maliciously,
    cinder-api will now reject some delete attachment requests that are
    deemed unsafe.
    
    Cinder will process the delete attachment request normally in the
    following cases:
    
    - The request comes from an OpenStack service that is sending the
      service token that has one of the roles in `service_token_roles`.
    - Attachment doesn't have an instance_uuid value
    - The instance for the attachment doesn't exist in Nova
    - According to Nova the volume is not connected to the instance
    - Nova is not using this attachment record
    
    There are 3 operations in the actions REST API endpoint that can be used
    for an attack:
    
    - `os-terminate_connection`: Terminate volume attachment
    - `os-detach`: Detach a volume
    - `os-force_detach`: Force detach a volume
    
    In this endpoint we just won't allow most requests not coming from a
    service. The rules we apply are the same as for attachment delete
    explained earlier, but in this case we may not have the attachment id
    and be more restrictive.  This should not be a problem for normal
    operations because:
    
    - Cinder backup doesn't use the REST API but RPC calls via RabbitMQ
    - Glance doesn't use this interface anymore
    
    Checking whether it's a service or not is done at the cinder-api level
    by checking that the service user that made the call has at least one of
    the roles in the `service_token_roles` configuration. These roles are
    retrieved from keystone by the keystone middleware using the value of
    the "X-Service-Token" header.
    
    If Cinder is configured with `service_token_roles_required = true` and
    an attacker provides non-service valid credentials the service will
    return a 401 error, otherwise it'll return 409 as if a normal user had
    made the call without the service token.
    
    Closes-Bug: #2004555
    Change-Id: I612905a1bf4a1706cce913c0d8a6df7a240d599a


** Changed in: cinder
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2004555

Title:
  [OSSA-2023-003] Unauthorized volume access through deleted volume
  attachments (CVE-2023-2088)

Status in Cinder:
  Fix Released
Status in glance_store:
  Fix Released
Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) antelope series:
  In Progress
Status in OpenStack Compute (nova) wallaby series:
  In Progress
Status in OpenStack Compute (nova) xena series:
  Fix Committed
Status in OpenStack Compute (nova) yoga series:
  Fix Committed
Status in OpenStack Compute (nova) zed series:
  Fix Committed
Status in os-brick:
  In Progress
Status in OpenStack Security Advisory:
  Fix Released
Status in OpenStack Security Notes:
  Fix Released

Bug description:
  Hello OpenStack Security Team,

  I’m writing to you, as we faced a serious security breach in OpenStack
  functionality(correlated a bit with libvirt, iscsi and huawei driver).
  I was going through OSSA documents and correlated libvirt notes, but I
  couldn't find something similar. It is not related to
  https://security.openstack.org/ossa/OSSA-2020-006.html

  In short: we observed that newly created cinder volume(1GB size) was
  attached to compute node instance, but an instance recognized it as a
  115GB volume, which(this 115GB volume) in fact was connected to
  another instance on the same compute node.

  [1. Test environment]
  Compute node: OpenStack Ussuri configured with Huawei dorado as a storage backend(configuration driver is available here: https://docs.openstack.org/cinder/rocky/configuration/block-storage/drivers/huawei-storage-driver.html)
  Packages:
  v# dpkg -l | grep libvirt
  ii  libvirt-clients                       6.0.0-0ubuntu8.16                                    amd64        Programs for the libvirt library
  ii  libvirt-daemon                        6.0.0-0ubuntu8.16                                    amd64        Virtualization daemon
  ii  libvirt-daemon-driver-qemu            6.0.0-0ubuntu8.16                                    amd64        Virtualization daemon QEMU connection driver
  ii  libvirt-daemon-driver-storage-rbd     6.0.0-0ubuntu8.16                                    amd64        Virtualization daemon RBD storage driver
  ii  libvirt-daemon-system                 6.0.0-0ubuntu8.16                                    amd64        Libvirt daemon configuration files
  ii  libvirt-daemon-system-systemd         6.0.0-0ubuntu8.16                                    amd64        Libvirt daemon configuration files (systemd)
  ii  libvirt0:amd64                        6.0.0-0ubuntu8.16                                    amd64        library for interfacing with different virtualization systems
  ii  nova-compute-libvirt                  2:21.2.4-0ubuntu1                                    all          OpenStack Compute - compute node libvirt support
  ii  python3-libvirt                       6.1.0-1                                              amd64        libvirt Python 3 bindings

  # dpkg -l | grep qemu
  ii  ipxe-qemu                             1.0.0+git-20190109.133f4c4-0ubuntu3.2                all          PXE boot firmware - ROM images for qemu
  ii  ipxe-qemu-256k-compat-efi-roms        1.0.0+git-20150424.a25a16d-0ubuntu4                  all          PXE boot firmware - Compat EFI ROM images for qemu
  ii  libvirt-daemon-driver-qemu            6.0.0-0ubuntu8.16                                    amd64        Virtualization daemon QEMU connection driver
  ii  qemu                                  1:4.2-3ubuntu6.23                                    amd64        fast processor emulator, dummy package
  ii  qemu-block-extra:amd64                1:4.2-3ubuntu6.23                                    amd64        extra block backend modules for qemu-system and qemu-utils
  ii  qemu-kvm                              1:4.2-3ubuntu6.23                                    amd64        QEMU Full virtualization on x86 hardware
  ii  qemu-system-common                    1:4.2-3ubuntu6.23                                    amd64        QEMU full system emulation binaries (common files)
  ii  qemu-system-data                      1:4.2-3ubuntu6.23                                    all          QEMU full system emulation (data files)
  ii  qemu-system-gui:amd64                 1:4.2-3ubuntu6.23                                    amd64        QEMU full system emulation binaries (user interface and audio support)
  ii  qemu-system-x86                       1:4.2-3ubuntu6.23                                    amd64        QEMU full system emulation binaries (x86)
  ii  qemu-utils                            1:4.2-3ubuntu6.23                                    amd64        QEMU utilities

  # dpkg -l | grep nova
  ii  nova-common                           2:21.2.4-0ubuntu1                                    all          OpenStack Compute - common files
  ii  nova-compute                          2:21.2.4-0ubuntu1                                    all          OpenStack Compute - compute node base
  ii  nova-compute-kvm                      2:21.2.4-0ubuntu1                                    all          OpenStack Compute - compute node (KVM)
  ii  nova-compute-libvirt                  2:21.2.4-0ubuntu1                                    all          OpenStack Compute - compute node libvirt support
  ii  python3-nova                          2:21.2.4-0ubuntu1                                    all          OpenStack Compute Python 3 libraries
  ii  python3-novaclient                    2:17.0.0-0ubuntu1                                    all          client library for OpenStack Compute API - 3.x

  # dpkg -l | grep multipath
  ii  multipath-tools                       0.8.3-1ubuntu2                                       amd64        maintain multipath block device access

  # dpkg -l | grep iscsi
  ii  libiscsi7:amd64                       1.18.0-2                                             amd64        iSCSI client shared library
  ii  open-iscsi                            2.0.874-7.1ubuntu6.2                                 amd64        iSCSI initiator tools

  # cat /etc/lsb-release
  DISTRIB_ID=Ubuntu
  DISTRIB_RELEASE=20.04
  DISTRIB_CODENAME=focal
  DISTRIB_DESCRIPTION="Ubuntu 20.04.4 LTS"

  Instance OS:  Debian-11-amd64

  [2. Test scenario]
  Already created instance with two volumes attached. First - 10GB for root system, second - 115GB used as vdb. Recognized by compute node as vda - dm-11, vdb - dm-9:

  # virsh domblklist 90fas439-fc0e-4e22-8d0b-6f2a18eee5c1
   Target   Source
  ----------------------
   vda      /dev/dm-11
   vdb      /dev/dm-9

  # multipath -ll
  (...)
  36e00084100ee7e7ed6ad25d900002f6b dm-9 HUAWEI,XSG1
  size=115G features='0' hwhandler='0' wp=rw
  `-+- policy='service-time 0' prio=1 status=active
    |- 14:0:0:4  sdm  8:192  active ready running
    |- 15:0:0:4  sdo  8:224  active ready running
    |- 16:0:0:4  sdl  8:176  active ready running
    `- 17:0:0:4  sdn  8:208  active ready running
  (...)
  36e00084100ee7e7ed6acaa2900002f6a dm-11 HUAWEI,XSG1
  size=10G features='0' hwhandler='0' wp=rw
  `-+- policy='service-time 0' prio=1 status=active
    |- 14:0:0:3  sdq  65:0   active ready running
    |- 15:0:0:3  sdr  65:16  active ready running
    |- 16:0:0:3  sdp  8:240  active ready running
    `- 17:0:0:3  sds  65:32  active ready running

  Creating a new instance, with the same OS guest system and 10GB root
  volume. After successful deployment, creating a new volume(1GB) size
  and attaching it to newly created instance. We can see after that:

  # multipath -ll
  (...)
  36e00084100ee7e7ed6ad25d900002f6b dm-9 HUAWEI,XSG1
  size=115G features='0' hwhandler='0' wp=rw
  `-+- policy='service-time 0' prio=1 status=active
    |- 14:0:0:10 sdao 66:128 failed faulty running
    |- 14:0:0:4  sdm  8:192  active ready running
    |- 15:0:0:10 sdap 66:144 failed faulty running
    |- 15:0:0:4  sdo  8:224  active ready running
    |- 16:0:0:10 sdan 66:112 failed faulty running
    |- 16:0:0:4  sdl  8:176  active ready running
    |- 17:0:0:10 sdaq 66:160 failed faulty running
    `- 17:0:0:4  sdn  8:208  active ready running

  This way at instance we were able to see a new drive - not 1GB, but
  115GB -> so it seems it was incorrectly attached and this way we were
  able to destroy some data on that volume.

  Additionaly we were able to see many errors like that in compute node
  logs:

  # dmesg -T | grep dm-9
  [Fri Jan 27 13:37:42 2023] blk_update_request: critical target error, dev dm-9, sector 62918760 op 0x1:(WRITE) flags 0x8800 phys_seg 2 prio class 0
  [Fri Jan 27 13:37:42 2023] blk_update_request: critical target error, dev dm-9, sector 33625152 op 0x1:(WRITE) flags 0x8800 phys_seg 6 prio class 0
  [Fri Jan 27 13:37:46 2023] blk_update_request: critical target error, dev dm-9, sector 66663000 op 0x1:(WRITE) flags 0x8800 phys_seg 5 prio class 0
  [Fri Jan 27 13:37:46 2023] blk_update_request: critical target error, dev dm-9, sector 66598120 op 0x1:(WRITE) flags 0x8800 phys_seg 5 prio class 0
  [Fri Jan 27 13:37:51 2023] blk_update_request: critical target error, dev dm-9, sector 66638680 op 0x1:(WRITE) flags 0x8800 phys_seg 12 prio class 0
  [Fri Jan 27 13:37:56 2023] blk_update_request: critical target error, dev dm-9, sector 66614344 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
  [Fri Jan 27 13:37:56 2023] blk_update_request: critical target error, dev dm-9, sector 66469296 op 0x1:(WRITE) flags 0x8800 phys_seg 24 prio class 0
  [Fri Jan 27 13:37:56 2023] blk_update_request: critical target error, dev dm-9, sector 66586472 op 0x1:(WRITE) flags 0x8800 phys_seg 3 prio class 0
  (...)

  Unfortunately we do not know what is a perfect test-scenario for it as
  we could face such issue in less than 2% of our tries, but it looks
  like a serious security breach.

  Additionally we observed that linux kernel is not fully clearing a device allocation(from volume detach), so some of drives names are visible in an output, i.e. lsblk command. Then, after new volume attachment we can see such names(i.e. sdao, sdap, sdan and so on) are reusable by that drive and wrongly mapped by multipath/iscsi to another drive and this way we hit an issue.
  Our question is why linux kernel of compute node is not removing devices allocation and this way is leading to a scenario like that? Maybe this can be a solution here.

  Thanks in advance for your help and understanding. In case when more
  details is needed, do not hesitate to contact me.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cinder/+bug/2004555/+subscriptions