yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #89353
[Bug 1982284] [NEW] libvirt live migration sometimes fails with "libvirt.libvirtError: internal error: migration was active, but no RAM info was set"
Public bug reported:
We have seen this downstream where live migration randomly fails with
the following error [1]:
libvirt.libvirtError: internal error: migration was active, but no RAM
info was set
Discussion on [1] gravitated toward a possible race condition issue in
qemu around the query-migrate command [2]. The query-migrate command is
used (indirectly) by the libvirt driver during monitoring of live
migrations [3][4][5].
While searching for info about this error, I found a thread on libvir-
list from the past [6] where someone else encountered the same error and
for them it happened if they called query-migrate *after* a live
migration had completed.
Based on this, it seemed possible that our live migration monitoring
thread sometimes races and calls jobStats() after the migration has
completed, resulting in this error being raised and the migration being
considered failed when it was actually complete.
A patch has since been proposed and committed [7] to address the
possible issue.
Meanwhile, on our side in nova, we can mitigate this problematic
behavior by catching the specific error from libvirt and ignoring it so
that a live migration in this situation will be considered completed by
the libvirt driver.
Doing this would improve the experience for users that are hitting this
error and getting erroneous live migration failures.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=2074205
[2] https://qemu.readthedocs.io/en/latest/interop/qemu-qmp-ref.html#qapidoc-1848
[3] https://github.com/openstack/nova/blob/bcb96f362ab12e297f125daa5189fb66345b4976/nova/virt/libvirt/driver.py#L10123
[4] https://github.com/openstack/nova/blob/bcb96f362ab12e297f125daa5189fb66345b4976/nova/virt/libvirt/guest.py#L655
[5] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainGetJobStats
[6] https://listman.redhat.com/archives/libvir-list/2021-January/213631.html
[7] https://github.com/qemu/qemu/commit/552de79bfdd5e9e53847eb3c6d6e4cd898a4370e
** Affects: nova
Importance: Undecided
Assignee: melanie witt (melwitt)
Status: In Progress
** Tags: libvirt live-migration
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1982284
Title:
libvirt live migration sometimes fails with "libvirt.libvirtError:
internal error: migration was active, but no RAM info was set"
Status in OpenStack Compute (nova):
In Progress
Bug description:
We have seen this downstream where live migration randomly fails with
the following error [1]:
libvirt.libvirtError: internal error: migration was active, but no
RAM info was set
Discussion on [1] gravitated toward a possible race condition issue in
qemu around the query-migrate command [2]. The query-migrate command
is used (indirectly) by the libvirt driver during monitoring of live
migrations [3][4][5].
While searching for info about this error, I found a thread on libvir-
list from the past [6] where someone else encountered the same error
and for them it happened if they called query-migrate *after* a live
migration had completed.
Based on this, it seemed possible that our live migration monitoring
thread sometimes races and calls jobStats() after the migration has
completed, resulting in this error being raised and the migration
being considered failed when it was actually complete.
A patch has since been proposed and committed [7] to address the
possible issue.
Meanwhile, on our side in nova, we can mitigate this problematic
behavior by catching the specific error from libvirt and ignoring it
so that a live migration in this situation will be considered
completed by the libvirt driver.
Doing this would improve the experience for users that are hitting
this error and getting erroneous live migration failures.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=2074205
[2] https://qemu.readthedocs.io/en/latest/interop/qemu-qmp-ref.html#qapidoc-1848
[3] https://github.com/openstack/nova/blob/bcb96f362ab12e297f125daa5189fb66345b4976/nova/virt/libvirt/driver.py#L10123
[4] https://github.com/openstack/nova/blob/bcb96f362ab12e297f125daa5189fb66345b4976/nova/virt/libvirt/guest.py#L655
[5] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainGetJobStats
[6] https://listman.redhat.com/archives/libvir-list/2021-January/213631.html
[7] https://github.com/qemu/qemu/commit/552de79bfdd5e9e53847eb3c6d6e4cd898a4370e
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1982284/+subscriptions
Follow ups
-
[Bug 1982284] Re: libvirt live migration sometimes fails with "libvirt.libvirtError: internal error: migration was active, but no RAM info was set"
From: melanie witt, 2024-11-21
-
[Bug 1982284] Re: libvirt live migration sometimes fails with "libvirt.libvirtError: internal error: migration was active, but no RAM info was set"
From: melanie witt, 2024-11-21
-
[Bug 1982284] Re: libvirt live migration sometimes fails with "libvirt.libvirtError: internal error: migration was active, but no RAM info was set"
From: melanie witt, 2024-11-21
-
[Bug 1982284] Re: libvirt live migration sometimes fails with "libvirt.libvirtError: internal error: migration was active, but no RAM info was set"
From: melanie witt, 2024-11-21
-
[Bug 1982284] Re: libvirt live migration sometimes fails with "libvirt.libvirtError: internal error: migration was active, but no RAM info was set"
From: Edward Hope-Morley, 2023-03-09
-
[Bug 1982284] Fix included in openstack/nova 26.1.0
From: OpenStack Infra, 2023-01-26
-
[Bug 1982284] Fix included in openstack/nova 25.1.0
From: OpenStack Infra, 2023-01-26
-
[Bug 1982284] Fix included in openstack/nova 24.2.0
From: OpenStack Infra, 2023-01-26
-
[Bug 1982284] Re: libvirt live migration sometimes fails with "libvirt.libvirtError: internal error: migration was active, but no RAM info was set"
From: melanie witt, 2022-10-07
-
[Bug 1982284] Re: libvirt live migration sometimes fails with "libvirt.libvirtError: internal error: migration was active, but no RAM info was set"
From: OpenStack Infra, 2022-10-06