← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1890501] Re: Soft reboot after live-migration reverts instance to original source domain XML (CVE-2020-17376)

 

Reviewed:  https://review.opendev.org/747969
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=1bb8ee95d4c3ddc3f607ac57526b75af1b7fbcff
Submitter: Zuul
Branch:    master

commit 1bb8ee95d4c3ddc3f607ac57526b75af1b7fbcff
Author: Lee Yarwood <lyarwood@xxxxxxxxxx>
Date:   Wed Aug 5 23:00:06 2020 +0100

    libvirt: Provide VIR_MIGRATE_PARAM_PERSIST_XML during live migration
    
    The VIR_MIGRATE_PARAM_PERSIST_XML parameter was introduced in libvirt
    v1.3.4 and is used to provide the new persistent configuration for the
    destination during a live migration:
    
    https://libvirt.org/html/libvirt-libvirt-domain.html#VIR_MIGRATE_PARAM_PERSIST_XML
    
    Without this parameter the persistent configuration on the destination
    will be the same as the original persistent configuration on the source
    when the VIR_MIGRATE_PERSIST_DEST flag is provided.
    
    As Nova does not currently provide the VIR_MIGRATE_PARAM_PERSIST_XML
    param but does provide the VIR_MIGRATE_PERSIST_DEST flag this means that
    a soft reboot by Nova of the instance after a live migration can revert
    the domain back to the original persistent configuration from the
    source.
    
    Note that this is only possible in Nova as a soft reboot actually
    results in the virDomainShutdown and virDomainLaunch libvirt APIs being
    called that recreate the domain using the persistent configuration.
    virDomainReboot does not result in this but is not called at this time.
    
    The impact of this on the instance after the soft reboot is pretty
    severe, host devices referenced in the original persistent configuration
    on the source may not exist or could even be used by other users on the
    destination. CPU and NUMA affinity could also differ drastically between
    the two hosts resulting in the instance being unable to start etc.
    
    As MIN_LIBVIRT_VERSION is now > v1.3.4 this change simply includes the
    VIR_MIGRATE_PARAM_PERSIST_XML param using the same updated XML for the
    destination as is already provided to VIR_MIGRATE_PARAM_DEST_XML.
    
    Co-authored-by: Tadayoshi Hosoya <tad-hosoya@xxxxxxxxxxxxx>
    Closes-Bug: #1890501
    Change-Id: Ia3f1d8e83cbc574ce5cb440032e12bbcb1e10e98


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1890501

Title:
  Soft reboot after live-migration reverts instance to original source
  domain XML (CVE-2020-17376)

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Security Advisory:
  In Progress

Bug description:
  Description
  ===========

  When live migrating instances with attached volumes Nova will first
  ensure that the volumes are connected on the destination before
  updating the underlying domain XML to be used on the destination to
  correctly map to these volumes.

  At present in the case of volumes connected over iSCSI or FC this
  ensures that the instance points to the correct host block devices as
  these may differ from the source.

  However if a user requests a soft reboot of an instance after a successful live migration the underlying libvirt domain will rollback to the XML definition used on the source. In the case of volumes provided over iSCSI or FC etc this can potentially lead to the wrong
   volume being attached to the instance on the destination leading to possible data exfiltration or corruption.

  It appears that this is due to Nova not providing
  VIR_MIGRATE_PARAM_PERSIST_XML during the migration resulting in the
  original source domains persistent configuration being used instead:

  /**
        * VIR_MIGRATE_PARAM_DEST_XML:
        *
        * virDomainMigrate* params field: the new configuration to be used for the
        * domain on the destination host as VIR_TYPED_PARAM_STRING. The configuration
        * must include an identical set of virtual devices, to ensure a stable guest
        * ABI across migration. Only parameters related to host side configuration
        * can be changed in the XML. Hypervisors which support this field will forbid
        * migration if the provided XML would cause a change in the guest ABI. This
        * field cannot be used to rename the domain during migration (use
        * VIR_MIGRATE_PARAM_DEST_NAME field for that purpose). Domain name in the
        * destination XML must match the original domain name.
        *
        * Omitting this parameter keeps the original domain configuration. Using this
        * field with hypervisors that do not support changing domain configuration
        * during migration will result in a failure.
        */
       # define VIR_MIGRATE_PARAM_DEST_XML          "destination_xml"

       /**
        * VIR_MIGRATE_PARAM_PERSIST_XML:
        *
        * virDomainMigrate* params field: the new persistent configuration to be used
        * for the domain on the destination host as VIR_TYPED_PARAM_STRING.
        * This field cannot be used to rename the domain during migration (use
        * VIR_MIGRATE_PARAM_DEST_NAME field for that purpose). Domain name in the
        * destination XML must match the original domain name.
        *
        * Omitting this parameter keeps the original domain persistent configuration.
        * Using this field with hypervisors that do not support changing domain
        * configuration during migration will result in a failure.
        */
       # define VIR_MIGRATE_PARAM_PERSIST_XML  "persistent_xml"

  Steps to reproduce
  ==================

     0) Deploy overcloud with multipath and iscsi/LVM cinder backend.
     1) Delete all instances and check no device path remained on both host1 and host2.
     2) Boot instances, VM1 on host1 and VM2 on host2.
        $ cinder create --name cirros1 --volume-type lvm --image cirros 1
        $ cinder create --name cirros2 --volume-type lvm --image cirros 1
        $ nova boot --block-device-mapping vda=$cirrosvol1 ... --host host1.localdomain testvm1
        $ nova boot --block-device-mapping vda=$cirrosvol2 ... --host host2.localdomain testvm2
        $ openstack server add floating ip testvm1 xx.xx.xx.xx
        $ openstack server add floating ip testvm2 yy.yy.yy.yy
     3) Soft reboot each instances and check no problem has occured.
        $ nova reboot testvm1
        $ nova reboot testvm2
     4) Execute live-migration VM1 to host2, check VMs for the device path setting in
        each XML.
        $ nova live-migration testvm1 host2.localdomain
     5) Execute soft reboot VM1, check VMs for the device path setting in each XML.
        $ nova reboot testvm1
     6) Login to each VMs and check syslogs.

  Expected result
  ===============

  After live-migration and soft reboot instance, device paths indicated
  by virsh dumpxml --inactive and qemu XML file are changed to new value
  fit to destination host.

  Actual result
  =============

  After live-migration and soft reboot instance, device paths indicated
  by virsh dumpxml --inactive and qemu XML file are the value of source
  host before migration.

  Environment
  ===========
  1. Exact version of OpenStack you are running. See the following
    list for all releases: http://docs.openstack.org/releases/

     Reported downstream against stable/train and libvirt 5.6.0-10.

  2. Which hypervisor did you use?
     (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
     What's the version of that?

     libvirt + KVM

  2. Which storage type did you use?
     (For example: Ceph, LVM, GPFS, ...)
     What's the version of that?

     LVM/iSCSI with multipath enabled but any host block based storage
  backend will do.

  3. Which networking type did you use?
     (For example: nova-network, Neutron with OpenVSwitch, ...)

     N/A

  Logs & Configs
  ==============

  The following test env logs are copied verbatim from a private
  downstream security bug:

  https://bugzilla.redhat.com/show_bug.cgi?id=1862353

     * Device paths initial state
                                       host1                                     host2
       ===================================================================================================
       VM1 multipath -ll               360014053825c172898b4ba4a5353515c dm-0    ---
           virsh dumpxml               <source dev='/dev/dm-0' index='1'/>       ---
           virsh dumpxml --inactive    <source dev='/dev/dm-0'/>                 ---
           qemu xml file               <source dev='/dev/dm-0'/>                 ---
       ---------------------------------------------------------------------------------------------------
       VM2 multipath -ll               ---                                       36001405fc681536d0124af2a9fd99c10 dm-0
           virsh dumpxml               ---                                       <source dev='/dev/dm-0' index='1'/>
           virsh dumpxml --inactive    ---                                       <source dev='/dev/dm-0'/>
           qemu xml file               ---                                       <source dev='/dev/dm-0'/>

     * Device paths after VM1 live-migration to host2
                                       host1    host2
       ===================================================================================================
       VM1 multipath -ll               ---      360014053825c172898b4ba4a5353515c dm-2
           virsh dumpxml               ---      <source dev='/dev/dm-2' index='1'/>
           virsh dumpxml --inactive    ---      <source dev='/dev/dm-0'/>              <== not dm-2
           qemu xml file               ---      <source dev='/dev/dm-0'/>              <== not dm-2
       ---------------------------------------------------------------------------------------------------
       VM2 multipath -ll               ---      36001405fc681536d0124af2a9fd99c10 dm-0
           virsh dumpxml               ---      <source dev='/dev/dm-0' index='1'/>
           virsh dumpxml --inactive    ---      <source dev='/dev/dm-0'/>
           qemu xml file               ---      <source dev='/dev/dm-0'/>

     * Device paths after soft reboot VM1 on host2
                                       host1    host2
       ===================================================================================================
       VM1 multipath -ll               ---      360014053825c172898b4ba4a5353515c dm-2
           virsh dumpxml               ---      <source dev='/dev/dm-0' index='1'/>    <== changed to dm-0
           virsh dumpxml --inactive    ---      <source dev='/dev/dm-0'/>
           qemu xml file               ---      <source dev='/dev/dm-0'/>
       ---------------------------------------------------------------------------------------------------
       VM2 multipath -ll               ---      36001405fc681536d0124af2a9fd99c10 dm-0
           virsh dumpxml               ---      <source dev='/dev/dm-0' index='1'/>
           virsh dumpxml --inactive    ---      <source dev='/dev/dm-0'/>
           qemu xml file               ---      <source dev='/dev/dm-0'/>

     * VM1 syslog file before live-migration
           $ cat /var/log/messages
           ...
           Jul 28 05:28:38 cirrostestvm1 kern.info kernel: [    0.780031] usb 1-1: new full-speed USB device number 2 using uhci_hcd
           Jul 28 05:28:39 cirrostestvm1 kern.info kernel: [    1.272305] Refined TSC clocksource calibration: 2099.976 MHz.
           Jul 28 05:28:40 cirrostestvm1 authpriv.info dropbear[260]: Running in background
           Jul 28 05:28:40 cirrostestvm1 daemon.info init: reloading /etc/inittab
           Jul 28 05:28:40 cirrostestvm1 daemon.info init: starting pid 1, tty '/dev/ttyS0': '/sbin/getty -L 115200 ttyS0 vt100 '
           Jul 28 05:28:40 cirrostestvm1 daemon.info init: starting pid 1, tty '/dev/tty1': '/sbin/getty 115200 tty1'
           Jul 28 05:28:48 cirrostestvm1 kern.debug kernel: [   10.992106] eth0: no IPv6 routers present
           Jul 28 05:29:45 cirrostestvm1 authpriv.info dropbear[301]: Child connection from **.**.**.**:33648
           Jul 28 05:29:48 cirrostestvm1 authpriv.notice dropbear[301]: Password auth succeeded for 'cirros' from **.**.**.**:33648
           $

     * VM1 syslog file after soft reboot on host2
         hostname command return correct value, but VM1 syslog is recorded by VM2.
         (in some cases, VM1 and VM2 syslog files are destroyed and cannot be read as text file)
           $ hostname
           cirrostestvm1
           $ cat /var/log/messages | tail
           Jul 28 06:03:01 cirrostestvm2 authpriv.info dropbear[325]: Child connection from 172.31.151.1:35894
           Jul 28 06:03:05 cirrostestvm2 authpriv.notice dropbear[325]: Password auth succeeded for 'cirros' from **.**.**.**:35894
           Jul 28 06:03:05 cirrostestvm2 authpriv.info dropbear[325]: Exit (cirros): Disconnect received
           Jul 28 06:03:30 cirrostestvm2 authpriv.info dropbear[328]: Child connection from **.**.**.**:36352
           Jul 28 06:03:34 cirrostestvm2 authpriv.notice dropbear[328]: Password auth succeeded for 'cirros' from **.**.**.**:36352
           Jul 28 06:03:34 cirrostestvm2 authpriv.info dropbear[328]: Exit (cirros): Disconnect received
           Jul 28 06:03:39 cirrostestvm2 authpriv.info dropbear[331]: Child connection from **.**.**.**:36484
           Jul 28 06:03:41 cirrostestvm2 authpriv.info dropbear[331]: Exit before auth (user 'cirros', 0 fails): Exited normally
           Jul 28 06:03:45 cirrostestvm2 authpriv.info dropbear[332]: Child connection from **.**.**.**:36588
           Jul 28 06:03:49 cirrostestvm2 authpriv.notice dropbear[332]: Password auth succeeded for 'cirros' from **.**.**.**:36588

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1890501/+subscriptions