← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1982284] Re: libvirt live migration sometimes fails with "libvirt.libvirtError: internal error: migration was active, but no RAM info was set"

 

Reviewed:  https://review.opendev.org/c/openstack/nova/+/852002
Committed: https://opendev.org/openstack/nova/commit/9fea934c71d3c2fa7fdd80c67d94e18466c5cf9a
Submitter: "Zuul (22348)"
Branch:    master

commit 9fea934c71d3c2fa7fdd80c67d94e18466c5cf9a
Author: Brett Milford <brett.milford@xxxxxxxxxxxxx>
Date:   Thu Aug 4 16:52:33 2022 +1000

    Handle "no RAM info was set" migration case
    
    This handles the case where the live migration monitoring thread may
    race and call jobStats() after the migration has completed resulting in
    the following error:
    
        libvirt.libvirtError: internal error: migration was active, but no
        RAM info was set
    
    Closes-Bug: #1982284
    Change-Id: I77fdfa9cffbd44b2889f49f266b2582bcc6a4267


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1982284

Title:
  libvirt live migration sometimes fails with "libvirt.libvirtError:
  internal error: migration was active, but no   RAM info was set"

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  We have seen this downstream where live migration randomly fails with
  the following error [1]:

    libvirt.libvirtError: internal error: migration was active, but no
  RAM info was set

  Discussion on [1] gravitated toward a possible race condition issue in
  qemu around the query-migrate command [2]. The query-migrate command
  is used (indirectly) by the libvirt driver during monitoring of live
  migrations [3][4][5].

  While searching for info about this error, I found a thread on libvir-
  list from the past [6] where someone else encountered the same error
  and for them it happened if they called query-migrate *after* a live
  migration had completed.

  Based on this, it seemed possible that our live migration monitoring
  thread sometimes races and calls jobStats() after the migration has
  completed, resulting in this error being raised and the migration
  being considered failed when it was actually complete.

  A patch has since been proposed and committed [7] to address the
  possible issue.

  Meanwhile, on our side in nova, we can mitigate this problematic
  behavior by catching the specific error from libvirt and ignoring it
  so that a live migration in this situation will be considered
  completed by the libvirt driver.

  Doing this would improve the experience for users that are hitting
  this error and getting erroneous live migration failures.

  [1] https://bugzilla.redhat.com/show_bug.cgi?id=2074205
  [2] https://qemu.readthedocs.io/en/latest/interop/qemu-qmp-ref.html#qapidoc-1848
  [3] https://github.com/openstack/nova/blob/bcb96f362ab12e297f125daa5189fb66345b4976/nova/virt/libvirt/driver.py#L10123
  [4] https://github.com/openstack/nova/blob/bcb96f362ab12e297f125daa5189fb66345b4976/nova/virt/libvirt/guest.py#L655
  [5] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainGetJobStats
  [6] https://listman.redhat.com/archives/libvir-list/2021-January/213631.html
  [7] https://github.com/qemu/qemu/commit/552de79bfdd5e9e53847eb3c6d6e4cd898a4370e

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1982284/+subscriptions



References