← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1913716] Re: Live-migrating an instance from 'Queens' (CentOS-7) to 'Train' (CentOS-8) fails during libvirt's compareCPU() check

 

Reviewed:  https://review.opendev.org/c/openstack/nova/+/838926
Committed: https://opendev.org/openstack/nova/commit/267a40663cd8d0b94bbc5ebda4ece55a45753b64
Submitter: "Zuul (22348)"
Branch:    master

commit 267a40663cd8d0b94bbc5ebda4ece55a45753b64
Author: Kashyap Chamarthy <kchamart@xxxxxxxxxx>
Date:   Thu Jan 28 16:35:10 2021 +0100

    libvirt: Add a workaround to skip compareCPU() on destination
    
    Nova's use of libvirt's compareCPU() API served its purpose
    over the years, but its design limitations break live migration in
    subtle ways.  For example, the compareCPU() API compares against the
    host physical CPUID.  Some of the features from this CPUID aren not
    exposed by KVM, and then there are some features that KVM emulates that
    are not in the host CPUID.  The latter can cause bogus live migration
    failures.
    
    With QEMU >=2.9 and libvirt >= 4.4.0, libvirt will do the right thing in
    terms of CPU compatibility checks on the destination host during live
    migration.  Nova satisfies these minimum version requirements by a good
    margin.  So, provide a workaround to skip the CPU comparison check on
    the destination host before migrating a guest, and let libvirt handle it
    correctly.  This workaround will be removed once Nova replaces the older
    libvirt APIs with their newer and improved counterparts[1][2].
    
                    - - -
    
    Note that Nova's libvirt driver calls compareCPU() in another method,
    _check_cpu_compatibility(); I did not remove its usage yet.  As it needs
    more careful combing of the code, and then:
    
      - where possible, remove the usage of compareCPU() altogether, and
        rely on libvirt doing the right thing under the hood; or
    
      - where Nova _must_ do the CPU comparison checks, switch to the better
        libvirt CPU APIs -- baselineHypervisorCPU() and
        compareHypervisorCPU() -- that are described here[1].  This is work
        in progress[2].
    
    [1] https://opendev.org/openstack/nova-specs/commit/70811da221035044e27
    [2] https://review.opendev.org/q/topic:bp%252Fcpu-selection-with-hypervisor-consideration
    
    Change-Id: I444991584118a969e9ea04d352821b07ec0ba88d
    Closes-Bug: #1913716
    Signed-off-by: Kashyap Chamarthy <kchamart@xxxxxxxxxx>
    Signed-off-by: Balazs Gibizer <bgibizer@xxxxxxxxxx>


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1913716

Title:
  Live-migrating an instance from 'Queens' (CentOS-7) to 'Train'
  (CentOS-8) fails during libvirt's compareCPU() check

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  [This bug was originally reported by Lukas Bezdicka when testing Red
  Hat's OpenStack (OSP); but this should be reproducible in upstream
  context as well.  I'm writing this report based on the root cause
  analysis in the environment where the bug occcurred.  Thanks to Daniel
  Berrangé for the debugging help!]

  Description
  -----------

  Live-migrating a guest from 'Queens' compute node (running CentOS 7) to
  a 'Train' compute node (running CentOS 8) fails with:

  -----------------------------------------------------------------------
  [...]
   _compare_cpu /usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py:8559
  2021-01-26 23:30:25.169 7 ERROR nova.virt.libvirt.driver [req-774be110-7fb6-4865-a177-d624a821cf9e 19ec0130b8714aac8c64a5c2ee5b914b 352675f5f34d45d59bdd61fde58e4bd0 - default default] CPU doesn't have compatibility.

  0

  Refer to http://libvirt.org/html/libvirt-libvirt-host.html#virCPUCompareResult
  2021-01-26 23:30:25.242 7 ERROR oslo_messaging.rpc.server [req-774be110-7fb6-4865-a177-d624a821cf9e 19ec0130b8714aac8c64a5c2ee5b914b 352675f5f34d45d59bdd61fde58e4bd0 - default default] Exception during message handling: nova.exception.InvalidCPUInfo: Unacceptable CPU info: CPU doesn't have compatibility.

  [...]

  2021-01-26 23:30:25.242 7 ERROR oslo_messaging.rpc.server     block_migration, disk_over_commit)
  2021-01-26 23:30:25.242 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 8258, in check_can_live_migrate_destination
  2021-01-26 23:30:25.242 7 ERROR oslo_messaging.rpc.server     self._compare_cpu(None, source_cpu_info, instance)
  2021-01-26 23:30:25.242 7 ERROR oslo_messaging.rpc.server   File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 8575, in _compare_cpu
  2021-01-26 23:30:25.242 7 ERROR oslo_messaging.rpc.server     raise exception.InvalidCPUInfo(reason=m % {'ret': ret, 'u': u})
  2021-01-26 23:30:25.242 7 ERROR oslo_messaging.rpc.server nova.exception.InvalidCPUInfo: Unacceptable CPU info: CPU doesn't have compatibility.
  2021-01-26 23:30:25.242 7 ERROR oslo_messaging.rpc.server
  [...]
  -----------------------------------------------------------------------

  Environment
  -----------

  The bug was reported by testing in a nested KVM environment, running on
  Intel hardware (Xeon(R) Gold 5218R CPU @ 2.10GHz), with the entire
  OpenStack setup in VMs.  So the Nova instances themselves will be nested
  guests.

    - Source: a CentOS-7 compute node (a level-1 guest), running OpenStack
      'Queens'

    - Destination: a CentOS-8 compute node (a level-1 guest), running
      OpenStack 'Train'

  Steps to reproduce
  ------------------

  Live-migrate a guest from source to host.

  Expected result
  ---------------

  Live migration should've succeeded.

  
  Actual result
  -------------

  Live migration fails during compareCPU() check on the destination host
  with:

  [...]
   _compare_cpu /usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py:8559
  2021-01-26 23:30:25.169 7 ERROR nova.virt.libvirt.driver [req-774be110-7fb6-4865-a177-d624a821cf9e 19ec0130b8714aac8c64a5c2ee5b914b 352675f5f34d45d59bdd61fde58e4bd0 - default default] CPU doesn't have compatibility.
  [...]

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1913716/+subscriptions



References