yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #86577
[Bug 1902276] Re: libvirtd going into a tight loop causing instances to not transition to ACTIVE
What do we want to do with this bug? Ultimately the underlying issue is
with the version of libvirt shipped in Bionic that's outside of our
control. Should we move the bug to the libvirt component and ask the
Ubuntu team if a rebase is even feasible?
** Also affects: libvirt
Importance: Undecided
Status: New
** Also affects: libvirt (Ubuntu)
Importance: Undecided
Status: New
** No longer affects: libvirt
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1902276
Title:
libvirtd going into a tight loop causing instances to not transition
to ACTIVE
Status in OpenStack Compute (nova):
New
Status in libvirt package in Ubuntu:
New
Bug description:
Description
===========
This is current master branch (wallaby) of OpenStack.
We seen this regularly, but it's intermittent. Said another way: We
see it on jobs daily, but it's not every run.
We are seeing nova instances that do not transition to ACTIVE inside
five minutes. Investigating this led us to find that libvirtd seems to
be going into a tight loop on an instance delete.
The 136MB log is here:
https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_c77/759973/3/check/octavia-v2
-dsvm-scenario/c77fe63/controller/logs/libvirt/libvirtd_log.txt
The overall job logs are here:
https://zuul.opendev.org/t/openstack/build/c77fe63a94ef4298872ad5f40c5df7d4/logs
When running the Octavia scenario test suite, we occasionally see nova
instances fail to become ACTIVE in a timely manner, causing timeouts
and failures. In investigating this issue we found the libvirtd log
was 136MB.
Most of the file is full of this repeating:
2020-10-28 23:45:06.330+0000: 20852: debug : qemuMonitorIO:767 : Error on monitor internal error: End of file from qemu monitor
2020-10-28 23:45:06.330+0000: 20852: debug : qemuMonitorIO:788 : Triggering EOF callback
2020-10-28 23:45:06.330+0000: 20852: debug : qemuProcessHandleMonitorEOF:301 : Received EOF on 0x7f6278014ca0 'instance-00000001'
2020-10-28 23:45:06.330+0000: 20852: debug : qemuProcessHandleMonitorEOF:305 : Domain is being destroyed, EOF is expected
Here is a snippet for the lead in to the repeated lines:
http://paste.openstack.org/show/799559/
It appears to be a tight loop, repeating many times per second.
Eventually it does stop and things seem to go back to normal in nova.
Here is the snippet of the end of the loop in the log:
http://paste.openstack.org/show/799560/
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1902276/+subscriptions
References